[
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421691#comment-15421691
]
Sahil Takiar edited comment on HIVE-13930 at 8/15/16 9:19 PM:
--------------------------------------------------------------
Sorry for the delay, I was out of the office for a few weeks. I looked into
this some more and believe I found the root cause.
Based on the logs from a Jenkins job, the Hive PTest2 Infra Master (runs on
EC2) doesn't do a fresh clone of the Hive repo, it uses the same repo for each
run. It just does a git pull, git clean, and mvn clean before the job starts.
Looking at the itests/pom.xml file (contains the script to download the Spark
tar-ball), it seems that the tar-ball will not be downloaded if it is already
present on the local filesystem. So even though the file on S3 has been
updated, the PTest2 Infra will not re-download it. This explains why the error
is still occurring.
I can think of a few solutions to this:
1: Simply delete the file on the PTest2 Infra Master
(/data/hive-ptest/working/apache-github-source-source/itests/thirdparty/spark-1.6.0-bin-hadoop2-without-hive.tgz).
This should trigger the build to download the new version of the tar-ball.
This may cause HoS itests to fail in other Hive QA runs since the new tar-ball
includes Hadoop 2.7 jars, but it should be fine.
2: Merge HIVE-12984 - this patch will delete the Spark tar-ball when mvn clean
is invoked. Nice because it will avoid this in the future, at least until
HIVE-14240 has been resolved.
3: Re-name the Spark tar-ball to something like
spark-spark.version-bin-hadoop2.7-without-hive (instead of hadoop2), and update
the itests/pom.xml file to use the new name (the file name may need to be
updated in a few other places)
was (Author: stakiar):
Sorry for the delay, I was out of the office for a few weeks. I looked into
this some more and believe I found the root cause.
Based on the logs from a Jenkins job, the Hive PTest2 Infra Master (runs on
EC2) doesn't do a fresh clone of the Hive repo, it uses the same repo for each
run. It just does a git pull, git clean, and mvn clean before the job starts.
Looking at the itests/pom.xml file (contains the script to download the Spark
tar-ball), it seems that the tar-ball will not be downloaded if it is already
present on the local filesystem. So even though the file on S3 has been
updated, the PTest2 Infra will not re-download it. This explains why the error
is still occurring.
I can think of a few solutions to this:
1: Simply delete the file on the PTest2 Infra Master
(/data/hive-ptest/working/apache-github-source-source/itests/thirdparty/spark-1.6.0-bin-hadoop2-without-hive.tgz).
This should trigger the build to download the new version of the tar-ball.
This may cause HoS itests to fail in other Hive QA runs since the new tar-ball
includes Hadoop 2.7 jars, but it should be fine.
2: Merge HIVE-12984 - this patch will delete the Spark tar-ball when mvn clean
is invoked. Nice because it will avoid this in the future, at least until
HIVE-14240 has been resolved.
3: Re-name the Spark tar-ball to something like
spark-${spark.version}-bin-hadoop2.7-without-hive (instead of -hadoop2-), and
update the itests/pom.xml file to use the new name (the file name may need to
be updated in a few other places)
> upgrade Hive to latest Hadoop version
> -------------------------------------
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch,
> HIVE-13930.03.patch, HIVE-13930.04.patch, HIVE-13930.05.patch,
> HIVE-13930.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)