Thanks for the details. I think it is also important to note that using
shims also means that users won’t easily be able to swap out HMS-2 to HMS-3
without upgrading (rebuilding) Impala. While in the current approach it
would only take a restart after pointing catalog to the hms-3 url.

On Fri, Apr 26, 2019 at 1:33 AM Joe McDonnell <joemcdonn...@cloudera.com>
wrote:

> Let me give a more detailed description of what the shim approach would
> look like:
> 1. Impala conditionally compiles against either Hive 2 or Hive 3. Code that
> is dependent on Hive version becomes part of a shim called from the other
> Java code. See fe/src/compat-minicluster-profile-* directories in the patch
> I mentioned. The fe/pom.xml determines which version of the shim to use via
> the IMPALA_MINICLUSTER_PROFILE environment variable, and it does
> conditional compilation.
> 2. The Hive 3 build starts out as experimental, with development continuing
> against Hive 2 by default. As the Hive 3 code matures, we eventually switch
> to Hive 3 development by default. This separation gives the Hive 3 / Hive 2
> compatibility some time to shake out.
> 3. We maintain both until Hive 2 is no longer interesting.
>
> On Thu, Apr 25, 2019 at 7:23 PM Vihang Karajgaonkar <vih...@cloudera.com>
> wrote:
>
> > When compiled with Hive 3, can Impala run Java UDFs using the deprecated
> > UDF interface?
> > >> Impala can still use the deprecated UDF interface. But if in Hive-3 a
> > UDF was moved from UDF to GenericUDF that would not be able to be run
> > without adding support for GenericUDFs in Impala.
> >
> > For example, if I have an Impala cluster
> > running Hive 2 that has custom Hive UDFs using the deprecated UDF
> > interface, can Impala still use them after moving to an Impala built with
> > Hive 3?
> > >> If you using a custom UDF which implements UDF, it should still work
> >
> > I will take a look at https://gerrit.cloudera.org/#/c/9716/ to see if we
> > can follow a similar approach.
> >
> > On Thu, Apr 25, 2019 at 4:45 PM Joe McDonnell <joemcdonn...@cloudera.com
> >
> > wrote:
> >
> > > Thanks for working on this. I'm interested in the specific impact that
> > this
> > > has on Java UDFs. When compiled with Hive 3, can Impala run Java UDFs
> > using
> > > the deprecated UDF interface? For example, if I have an Impala cluster
> > > running Hive 2 that has custom Hive UDFs using the deprecated UDF
> > > interface, can Impala still use them after moving to an Impala built
> with
> > > Hive 3? I want to confirm that this is backwards compatible. Do Hive
> UDFs
> > > ever depend on Hive components on the CLASSPATH? In other words, if
> > Impala
> > > is running with Hive 3 jars on its CLASSPATH, does that impact a legacy
> > > Hive UDFs built against Hive 2?
> > >
> > > Depending on how much code needs to change to use Hive 3, an
> alternative
> > is
> > > to introduce build-time shims for the differences between Hive 2 and
> Hive
> > > 3. This is how the Impala 2 to Impala 3 transition worked (IMPALA-4277:
> > > https://gerrit.cloudera.org/#/c/9716/ ).
> > >
> > > Thanks,
> > > Joe
> > >
> > > On Thu, Apr 25, 2019 at 3:09 PM Vihang Karajgaonkar <
> vih...@cloudera.com
> > >
> > > wrote:
> > >
> > > > Hello All,
> > > >
> > > > As some of you might have noticed I have been working on IMPALA-8369
> > > > <https://issues.apache.org/jira/browse/IMPALA-8369> and I have a WIP
> > > patch
> > > > on gerrit <https://gerrit.cloudera.org/#/c/13005/>. The current plan
> > to
> > > is
> > > > build using Hive-3 libraries while keeping compatibility with Hive-2.
> > > This
> > > > gives us the advantage of keeping only one branch which works with
> both
> > > the
> > > > setups. If we hit roadblocks for which don't have any good solutions,
> > the
> > > > fall-back could be to branch off and create a separate branch for
> HMS-3
> > > > support.
> > > >
> > > > The patch attempts to add support into Impala the ability to talk to
> > > > HMS-3.x while keeping the ability to talk to HMS-2 intact. This is
> done
> > > > using the following approach:
> > > >
> > > > 1. Reduce the unnecessary dependencies from Hive (specifically
> > hive-exec
> > > > jar which is a fat jar including almost all of the hive code). This
> is
> > a
> > > > in-general good thing to do in my opinion so that we don't
> > > unintentionally
> > > > add compile time dependencies to non-public APIs of Hive. It
> > introduces a
> > > > new shaded-deps module where we exclude all the unnecessary code from
> > the
> > > > hive-exec.jar to create a reduced jar which we depend on currently.
> > > > 2. Change the build scripts so that we use Hive 3 binaries to
> compile.
> > > The
> > > > toolchain is updated with a custom Hive build (will change it to
> > official
> > > > builds once I have the hive patches merged). The metastore maintains
> > > thrift
> > > > wire compatibility with older releases. What is missing that when you
> > are
> > > > using HMS3 client you cannot talk to HMS2 because Hive doesn't
> > gaurantee
> > > > backwards compatibility from client perspective (newer client talking
> > to
> > > > older server).  This needs some fixing on Hive side (HIVE-21596)
> which
> > I
> > > am
> > > > also currently working on in parallel. The working prototype which I
> > have
> > > > been using works well so far for this usecase (HMS3 client talking to
> > > > HMS2).
> > > > 3. Additionally, there were some fixes which are needed from Hive
> side
> > > > (HIVE-21586) to make sure Impala can compile using Hive 3 libraries.
> > > >
> > > > The advantages of this approach are:
> > > > 1 .We get to maintain only one branch of code and it works with both
> > > HMS-2
> > > > and HMS-3 based deployments. I have been able to run the existing
> tests
> > > > against HMS-2 with the patch. There are still 3 tests which fail but
> I
> > > > think we can fix them too. Running tests against HMS-3 may need some
> > more
> > > > work and will be targetted in a separate JIRA.
> > > > 2. We can start supporting new features of HMS like ( eg
> transactional
> > > > tables).
> > > >
> > > > There are a few caveats:
> > > > 1. Some of the built-in functions in Hive (UDFs) moved from the
> > > deprecated
> > > > UDF interface to the GenericUDF API. Since Impala currently only
> > supports
> > > > UDF execution then built-in functions (so far I have found UDFLength,
> > > > UDFYear, UDFHour) will not work when we start using Hive 3 binaries.
> In
> > > > order to fix this we should add support for GenericUDFs similar to
> the
> > > UDFs
> > > > 2. We need some additional patches on top of Hive 3.1.0 like the two
> > > above
> > > > to build against Hive 3
> > > >
> > > > The alternative to this approach is to branch off and have separate
> > > > branches for Hive-2 and Hive-3 support. This would mean more
> > > cherry-picking
> > > > and maintenance to keep each of these branch up-to-date and multiple
> > > > release cadence. Eventually, one of the branch will become the main
> > > > development branch after which we can retire the other line.
> > > >
> > > > Let me know if this all sounds reasonable or if there are any blocker
> > > > concerns on this.
> > > >
> > > > Thanks,
> > > > Vihang
> > > >
> > >
> >
>

Reply via email to