[ 
https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035045#comment-16035045
 ] 

Rohini Palaniswamy commented on PIG-5246:
-----------------------------------------

Users should not have to specify -sparkversion 1 or 2 to determine which 
version. You should detect that in the script. For Hadoop 1.x and 2.x it was 
done by checking for hadoop-core.jar. You can do same thing here. Currently we 
still have problem of having to compile the shims classes against different 
versions.

There is a hack I did internally for hbase 0.94 to hbase 0.98 migration for 
HBaseStorage to support both HBase 0.94 and 0.98 with same pig jar during the 
migration. Have attached the patch for it. It is more code and slightly 
convoluted as each class now redirects to the shims class based on version 
detection. For eg: In Spark JobMetricsListener will redirect to 
JobMetricsListenerSpark1 or JobMetricsListenerSpark2. But for users it makes it 
very simple as they can use same pig installation to run against any version. 
[~nkollar], do you want to try this approach as part of PIG-5157 (Spark 2 
support) and PIG-5191 (HBase 2 support) ?

 Similarly we can add a target to compile against all versions of both spark 
and hbase (and hadoop 3.0 in future if required) and create a pig.jar which 
will run with anything. 



> Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
> ------------------------------------------------------------------------------
>
>                 Key: PIG-5246
>                 URL: https://issues.apache.org/jira/browse/PIG-5246
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: PIG-5246.1.patch, PIG-5246.patch
>
>
> in bin/pig.
> we copy assembly jar to pig's classpath in spark1.6.
> {code}
> # For spark mode:
> # Please specify SPARK_HOME first so that we can locate 
> $SPARK_HOME/lib/spark-assembly*.jar,
> # we will add spark-assembly*.jar to the classpath.
> if [ "$isSparkMode"  == "true" ]; then
>     if [ -z "$SPARK_HOME" ]; then
>        echo "Error: SPARK_HOME is not set!"
>        exit 1
>     fi
>     # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar 
> to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need 
> to be distributed each time an application runs.
>     if [ -z "$SPARK_JAR" ]; then
>        echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs 
> location of spark-assembly*.jar. This allows YARN to cache 
> spark-assembly*.jar on nodes so that it doesn't need to be distributed each 
> time an application runs."
>        exit 1
>     fi
>     if [ -n "$SPARK_HOME" ]; then
>         echo "Using Spark Home: " ${SPARK_HOME}
>         SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*`
>         CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR
>     fi
> fi
> {code}
> after upgrade to spark2.0, we may modify it



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to