[jira] [Comment Edited] (HIVE-13930) upgrade Hive to latest Hadoop version

Sahil Takiar (JIRA) Mon, 15 Aug 2016 14:20:37 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421691#comment-15421691
 ]


Sahil Takiar edited comment on HIVE-13930 at 8/15/16 9:19 PM:
--------------------------------------------------------------

Sorry for the delay, I was out of the office for a few weeks. I looked into 
this some more and believe I found the root cause.

Based on the logs from a Jenkins job, the Hive PTest2 Infra Master (runs on 
EC2) doesn't do a fresh clone of the Hive repo, it uses the same repo for each 
run. It just does a git pull, git clean, and mvn clean before the job starts. 
Looking at the itests/pom.xml file (contains the script to download the Spark 
tar-ball), it seems that the tar-ball will not be downloaded if it is already 
present on the local filesystem. So even though the file on S3 has been 
updated, the PTest2 Infra will not re-download it. This explains why the error 
is still occurring.

I can think of a few solutions to this:

1: Simply delete the file on the PTest2 Infra Master 
(/data/hive-ptest/working/apache-github-source-source/itests/thirdparty/spark-1.6.0-bin-hadoop2-without-hive.tgz).
 This should trigger the build to download the new version of the tar-ball. 
This may cause HoS itests to fail in other Hive QA runs since the new tar-ball 
includes Hadoop 2.7 jars, but it should be fine.

2: Merge HIVE-12984 - this patch will delete the Spark tar-ball when mvn clean 
is invoked. Nice because it will avoid this in the future, at least until 
HIVE-14240 has been resolved.

3: Re-name the Spark tar-ball to something like 
spark-spark.version-bin-hadoop2.7-without-hive (instead of hadoop2), and update 
the itests/pom.xml file to use the new name (the file name may need to be 
updated in a few other places)


was (Author: stakiar):
Sorry for the delay, I was out of the office for a few weeks. I looked into 
this some more and believe I found the root cause.

Based on the logs from a Jenkins job, the Hive PTest2 Infra Master (runs on 
EC2) doesn't do a fresh clone of the Hive repo, it uses the same repo for each 
run. It just does a git pull, git clean, and mvn clean before the job starts. 
Looking at the itests/pom.xml file (contains the script to download the Spark 
tar-ball), it seems that the tar-ball will not be downloaded if it is already 
present on the local filesystem. So even though the file on S3 has been 
updated, the PTest2 Infra will not re-download it. This explains why the error 
is still occurring.

I can think of a few solutions to this:

1: Simply delete the file on the PTest2 Infra Master 
(/data/hive-ptest/working/apache-github-source-source/itests/thirdparty/spark-1.6.0-bin-hadoop2-without-hive.tgz).
 This should trigger the build to download the new version of the tar-ball. 
This may cause HoS itests to fail in other Hive QA runs since the new tar-ball 
includes Hadoop 2.7 jars, but it should be fine.

2: Merge HIVE-12984 - this patch will delete the Spark tar-ball when mvn clean 
is invoked. Nice because it will avoid this in the future, at least until 
HIVE-14240 has been resolved.

3: Re-name the Spark tar-ball to something like 
spark-${spark.version}-bin-hadoop2.7-without-hive (instead of -hadoop2-), and 
update the itests/pom.xml file to use the new name (the file name may need to 
be updated in a few other places)

> upgrade Hive to latest Hadoop version
> -------------------------------------
>
>                 Key: HIVE-13930
>                 URL: https://issues.apache.org/jira/browse/HIVE-13930
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, 
> HIVE-13930.03.patch, HIVE-13930.04.patch, HIVE-13930.05.patch, 
> HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-13930) upgrade Hive to latest Hadoop version

Reply via email to