n3nash commented on issue #1005: [HUDI-91][HUDI-12]Migrate to spark 2.4.4, 
migrate to spark-avro library instead of databricks-avro, add support for 
Decimal/Date types
URL: https://github.com/apache/incubator-hudi/pull/1005#issuecomment-555386022
 
 
   @umehrot2 Thanks for enumerating your thoughts. Let me add some more context 
here.
   
   Firstly, hive-exec has a classifier `core` that allows you to get a 
dependency reduced version of the jar. Although this allows us to workaround 
the fat jar problem, there is another problem with this dependency reduced 
version of the jar which doesn't package some of the required transitive 
dependencies needed by classes in this jar. There are ways to fix this as well 
by including those relocated dependencies directly in Hudi (@modi95 was trying 
it at Uber)
   
   Secondly, there is no support for Spark's fork of Hive (1.2.1.spark.2). This 
was forked by the Spark community to solve the exact issue of hive jars not 
bundling the correct dependencies that I described above, read more here : 
https://issues.apache.org/jira/browse/HIVE-16391?focusedCommentId=16032497&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16032497
 and then some more changes were added to the fork which are NOT necessary 
according to the comments in the same jira. 
   
   In fact, there is a strong need in the spark community to move away from 
this forked version to the regular hive version. See here : 
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Upgrade-built-in-Hive-to-2-3-4-td26153.html.
   
   But I see your point on having the spark-modules depend on the spark-hive 
version, this way it's clear and we don't have to solve this issue ourselves.
   
   I have a few hesitations in introducing a spark's forked hive version : a) 
This means we have 2 hive versions across the project b) The spark's forked 
version of hive doesn't have anything more apart from solving the hive-exec jar 
mess. 
   I'm actually okay with (b). @vinothchandar @bvaradar If you're okay with (a) 
and don't see any issues, I'm fine with taking this approach. Personally, don't 
have much foresight on the side-effects of having different versions across the 
project in the future.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to