When compiled with Hive 3, can Impala run Java UDFs using the deprecated
UDF interface?
>> Impala can still use the deprecated UDF interface. But if in Hive-3 a
UDF was moved from UDF to GenericUDF that would not be able to be run
without adding support for GenericUDFs in Impala.

For example, if I have an Impala cluster
running Hive 2 that has custom Hive UDFs using the deprecated UDF
interface, can Impala still use them after moving to an Impala built with
Hive 3?
>> If you using a custom UDF which implements UDF, it should still work

I will take a look at https://gerrit.cloudera.org/#/c/9716/ to see if we
can follow a similar approach.

On Thu, Apr 25, 2019 at 4:45 PM Joe McDonnell <joemcdonn...@cloudera.com>
wrote:

> Thanks for working on this. I'm interested in the specific impact that this
> has on Java UDFs. When compiled with Hive 3, can Impala run Java UDFs using
> the deprecated UDF interface? For example, if I have an Impala cluster
> running Hive 2 that has custom Hive UDFs using the deprecated UDF
> interface, can Impala still use them after moving to an Impala built with
> Hive 3? I want to confirm that this is backwards compatible. Do Hive UDFs
> ever depend on Hive components on the CLASSPATH? In other words, if Impala
> is running with Hive 3 jars on its CLASSPATH, does that impact a legacy
> Hive UDFs built against Hive 2?
>
> Depending on how much code needs to change to use Hive 3, an alternative is
> to introduce build-time shims for the differences between Hive 2 and Hive
> 3. This is how the Impala 2 to Impala 3 transition worked (IMPALA-4277:
> https://gerrit.cloudera.org/#/c/9716/ ).
>
> Thanks,
> Joe
>
> On Thu, Apr 25, 2019 at 3:09 PM Vihang Karajgaonkar <vih...@cloudera.com>
> wrote:
>
> > Hello All,
> >
> > As some of you might have noticed I have been working on IMPALA-8369
> > <https://issues.apache.org/jira/browse/IMPALA-8369> and I have a WIP
> patch
> > on gerrit <https://gerrit.cloudera.org/#/c/13005/>. The current plan to
> is
> > build using Hive-3 libraries while keeping compatibility with Hive-2.
> This
> > gives us the advantage of keeping only one branch which works with both
> the
> > setups. If we hit roadblocks for which don't have any good solutions, the
> > fall-back could be to branch off and create a separate branch for HMS-3
> > support.
> >
> > The patch attempts to add support into Impala the ability to talk to
> > HMS-3.x while keeping the ability to talk to HMS-2 intact. This is done
> > using the following approach:
> >
> > 1. Reduce the unnecessary dependencies from Hive (specifically hive-exec
> > jar which is a fat jar including almost all of the hive code). This is a
> > in-general good thing to do in my opinion so that we don't
> unintentionally
> > add compile time dependencies to non-public APIs of Hive. It introduces a
> > new shaded-deps module where we exclude all the unnecessary code from the
> > hive-exec.jar to create a reduced jar which we depend on currently.
> > 2. Change the build scripts so that we use Hive 3 binaries to compile.
> The
> > toolchain is updated with a custom Hive build (will change it to official
> > builds once I have the hive patches merged). The metastore maintains
> thrift
> > wire compatibility with older releases. What is missing that when you are
> > using HMS3 client you cannot talk to HMS2 because Hive doesn't gaurantee
> > backwards compatibility from client perspective (newer client talking to
> > older server).  This needs some fixing on Hive side (HIVE-21596) which I
> am
> > also currently working on in parallel. The working prototype which I have
> > been using works well so far for this usecase (HMS3 client talking to
> > HMS2).
> > 3. Additionally, there were some fixes which are needed from Hive side
> > (HIVE-21586) to make sure Impala can compile using Hive 3 libraries.
> >
> > The advantages of this approach are:
> > 1 .We get to maintain only one branch of code and it works with both
> HMS-2
> > and HMS-3 based deployments. I have been able to run the existing tests
> > against HMS-2 with the patch. There are still 3 tests which fail but I
> > think we can fix them too. Running tests against HMS-3 may need some more
> > work and will be targetted in a separate JIRA.
> > 2. We can start supporting new features of HMS like ( eg transactional
> > tables).
> >
> > There are a few caveats:
> > 1. Some of the built-in functions in Hive (UDFs) moved from the
> deprecated
> > UDF interface to the GenericUDF API. Since Impala currently only supports
> > UDF execution then built-in functions (so far I have found UDFLength,
> > UDFYear, UDFHour) will not work when we start using Hive 3 binaries. In
> > order to fix this we should add support for GenericUDFs similar to the
> UDFs
> > 2. We need some additional patches on top of Hive 3.1.0 like the two
> above
> > to build against Hive 3
> >
> > The alternative to this approach is to branch off and have separate
> > branches for Hive-2 and Hive-3 support. This would mean more
> cherry-picking
> > and maintenance to keep each of these branch up-to-date and multiple
> > release cadence. Eventually, one of the branch will become the main
> > development branch after which we can retire the other line.
> >
> > Let me know if this all sounds reasonable or if there are any blocker
> > concerns on this.
> >
> > Thanks,
> > Vihang
> >
>

Reply via email to