[
https://issues.apache.org/jira/browse/HIVE-17574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mithun Radhakrishnan updated HIVE-17574:
----------------------------------------
Description:
Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.)
This has to do with the classpaths of Hive actions run from Oozie, and affects
scripts that adds jars/resources from HDFS locations.
As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) tend
to be stored in HDFS paths, as are any custom user-libraries used in workflows.
An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the following
steps to occur:
# Files are downloaded from HDFS to local temp dir.
# UDFs are resolved/validated.
# All jars/files, including those just downloaded from HDFS, are shipped right
back to HDFS-based scratch-directories, for job submission.
For HDFS-based files, this is wasteful and time-consuming. #3 above should skip
shipping HDFS-based resources, and add those directly to the Tez session.
We have a patch that's being used internally at Yahoo.
was:
Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.)
This has to do with the classpaths of Hive actions run from Oozie, and affects
scripts that adds jars/resources from HDFS locations.
As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) tend
to be stored in HDFS paths, as are any custom user-libraries used in workflows.
An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the following
steps to occur:
# Files are downloaded from HDFS to local temp dir.
# UDFs are resolved/validated.
# All jars/files, including those just downloaded from HDFS, are shipped right
back to HDFS-based scratch-directories, for job submission.
This is wasteful and time-consuming. #3 above should skip shipping HDFS-based
resources, and add those directly to the Tez session.
We have a patch that's being used internally at Yahoo.
> Avoid multiple copies of HDFS-based jars when localizing job-jars
> -----------------------------------------------------------------
>
> Key: HIVE-17574
> URL: https://issues.apache.org/jira/browse/HIVE-17574
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.2.0, 3.0.0, 2.4.0
> Reporter: Mithun Radhakrishnan
> Assignee: Mithun Radhakrishnan
>
> Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.)
> This has to do with the classpaths of Hive actions run from Oozie, and
> affects scripts that adds jars/resources from HDFS locations.
> As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars)
> tend to be stored in HDFS paths, as are any custom user-libraries used in
> workflows. An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the
> following steps to occur:
> # Files are downloaded from HDFS to local temp dir.
> # UDFs are resolved/validated.
> # All jars/files, including those just downloaded from HDFS, are shipped
> right back to HDFS-based scratch-directories, for job submission.
> For HDFS-based files, this is wasteful and time-consuming. #3 above should
> skip shipping HDFS-based resources, and add those directly to the Tez session.
> We have a patch that's being used internally at Yahoo.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)