[ 
https://issues.apache.org/jira/browse/SPARK-23710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-23710:
--------------------------------
    Description: 
Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop 3.x 
to be an unknown Hadoop version. see SPARK-18673 and HIVE-16081 for more 
details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an 
umbrella JIRA to track this upgrade.

 

*Upgrade Plan*:
 # SPARK-27054 Remove the Calcite dependency. This can avoid some jar conflicts.
 # SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove 
OrcProto.Type usage
 # SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles 
when testing
 # Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and 
compile passed on Hive 2.3.4
 # Add an empty hive-thriftserverV2 module. then we could test all test cases 
in next step
 # Make Hadoop-3.1 with Hive 2.3.4 test passed
 # Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's 
[TCLIService.thrift|https://github.com/apache/hive/blob/rel/release-2.3.4/service-rpc/if/TCLIService.thrift]

 

I have completed the [initial work|https://github.com/apache/spark/pull/24044] 
and plan to finish this upgrade step by step.
  

 

  was:
Upgrade built-in Hive to 2.3.4 for Hadoop-3.1(Please note that this upgrade 
only for Hadoop-3.1).

To achieve this. We need to change sql/core, sql/hive, sql/hive-thriftserver 
modules at least:

*sql/core*: Add two source directories(sql/core/v1.2.1 and sql/core/v2.3.4) to 
distinguish the code for different built-in Hive.
 *sql/hive:* use Java reflect or shim to support Hive 1.2.1 and Hive 2.3.4 same 
time.
 *sql/hive-thriftserver:* Add new thriftserver named hive-thriftserverV2 with 
Hive 2.3.4's 
[TCLIService.thrift|https://github.com/apache/hive/blob/rel/release-2.3.4/service-rpc/if/TCLIService.thrift].

Spark fail to run on Hadoop 3.x, because Hive's shimloader considers Hadoop 3.x 
to be an unknown Hadoop version. see 
[SPARK-18673|https://issues.apache.org/jira/browse/SPARK-18673] and 
[HIVE-16081|https://issues.apache.org/jira/browse/HIVE-16081] for more details.

 

 

Upgrade Plan:
 # SPARK-27054 Remove the Calcite dependency. This can avoid some jar conflicts.
 # SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove 
OrcProto.Type usage
 # SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles 
when testing
 # Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and 
compile passed on Hive 2.3.4
 # Add an empty hive-thriftserverV2 module. then we could test all test cases 
in next step
 # Make Hadoop-3.1 with Hive 2.3.4 test passed
 # Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's 
[TCLIService.thrift|https://github.com/apache/hive/blob/rel/release-2.3.4/service-rpc/if/TCLIService.thrift]

 

 


> Upgrade the built-in Hive to 2.3.4 for hadoop-3.1
> -------------------------------------------------
>
>                 Key: SPARK-23710
>                 URL: https://issues.apache.org/jira/browse/SPARK-23710
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Yuming Wang
>            Priority: Critical
>
> Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop 
> 3.x to be an unknown Hadoop version. see SPARK-18673 and HIVE-16081 for more 
> details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an 
> umbrella JIRA to track this upgrade.
>  
> *Upgrade Plan*:
>  # SPARK-27054 Remove the Calcite dependency. This can avoid some jar 
> conflicts.
>  # SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove 
> OrcProto.Type usage
>  # SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles 
> when testing
>  # Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and 
> compile passed on Hive 2.3.4
>  # Add an empty hive-thriftserverV2 module. then we could test all test cases 
> in next step
>  # Make Hadoop-3.1 with Hive 2.3.4 test passed
>  # Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's 
> [TCLIService.thrift|https://github.com/apache/hive/blob/rel/release-2.3.4/service-rpc/if/TCLIService.thrift]
>  
> I have completed the [initial 
> work|https://github.com/apache/spark/pull/24044] and plan to finish this 
> upgrade step by step.
>   
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to