[ https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035045#comment-16035045 ]
Rohini Palaniswamy commented on PIG-5246: ----------------------------------------- Users should not have to specify -sparkversion 1 or 2 to determine which version. You should detect that in the script. For Hadoop 1.x and 2.x it was done by checking for hadoop-core.jar. You can do same thing here. Currently we still have problem of having to compile the shims classes against different versions. There is a hack I did internally for hbase 0.94 to hbase 0.98 migration for HBaseStorage to support both HBase 0.94 and 0.98 with same pig jar during the migration. Have attached the patch for it. It is more code and slightly convoluted as each class now redirects to the shims class based on version detection. For eg: In Spark JobMetricsListener will redirect to JobMetricsListenerSpark1 or JobMetricsListenerSpark2. But for users it makes it very simple as they can use same pig installation to run against any version. [~nkollar], do you want to try this approach as part of PIG-5157 (Spark 2 support) and PIG-5191 (HBase 2 support) ? Similarly we can add a target to compile against all versions of both spark and hbase (and hadoop 3.0 in future if required) and create a pig.jar which will run with anything. > Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2 > ------------------------------------------------------------------------------ > > Key: PIG-5246 > URL: https://issues.apache.org/jira/browse/PIG-5246 > Project: Pig > Issue Type: Bug > Reporter: liyunzhang_intel > Assignee: liyunzhang_intel > Attachments: PIG-5246.1.patch, PIG-5246.patch > > > in bin/pig. > we copy assembly jar to pig's classpath in spark1.6. > {code} > # For spark mode: > # Please specify SPARK_HOME first so that we can locate > $SPARK_HOME/lib/spark-assembly*.jar, > # we will add spark-assembly*.jar to the classpath. > if [ "$isSparkMode" == "true" ]; then > if [ -z "$SPARK_HOME" ]; then > echo "Error: SPARK_HOME is not set!" > exit 1 > fi > # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar > to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need > to be distributed each time an application runs. > if [ -z "$SPARK_JAR" ]; then > echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs > location of spark-assembly*.jar. This allows YARN to cache > spark-assembly*.jar on nodes so that it doesn't need to be distributed each > time an application runs." > exit 1 > fi > if [ -n "$SPARK_HOME" ]; then > echo "Using Spark Home: " ${SPARK_HOME} > SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*` > CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR > fi > fi > {code} > after upgrade to spark2.0, we may modify it -- This message was sent by Atlassian JIRA (v6.3.15#6346)