[
https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904758#comment-13904758
]
Brock Noland commented on HIVE-860:
-----------------------------------
bq. Are you proposing to change the contents of the hive-exec.jar as
distributed with Hive or just as pushed to Hadoop for running a job?
both
bq. If it's the former won't it mean that any project that includes
hive-exec.jar in it's pom.xml will have to change its pom to explicitly include
all of the extra jars now in the fat jar?
Nope. The previously shaded jars are listed as dependencies in the source pom
file and thus they will be pulled in transitively by depending on hive-exec. I
have verified this locally. That is after a mvn install before the patch all
the currently shaded jars are removed from the published pom file so they are
not pulled in transitively. After the patch, the only jar which is shaded is
kryo, and it is the only one which is removed from the published pom. That is
to say the other dependencies remain in the pom for clients. This is inline
which my expectations.
> Persistent distributed cache
> ----------------------------
>
> Key: HIVE-860
> URL: https://issues.apache.org/jira/browse/HIVE-860
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 0.12.0
> Reporter: Zheng Shao
> Assignee: Brock Noland
> Fix For: 0.13.0
>
> Attachments: HIVE-860.patch, HIVE-860.patch, HIVE-860.patch,
> HIVE-860.patch, HIVE-860.patch, HIVE-860.patch
>
>
> DistributedCache is shared across multiple jobs, if the hdfs file name is the
> same.
> We need to make sure Hive put the same file into the same location every time
> and do not overwrite if the file content is the same.
> We can achieve 2 different results:
> A1. Files added with the same name, timestamp, and md5 in the same session
> will have a single copy in distributed cache.
> A2. Filed added with the same name, timestamp, and md5 will have a single
> copy in distributed cache.
> A2 has a bigger benefit in sharing but may raise a question on when Hive
> should clean it up in hdfs.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)