Persistent distributed cache
----------------------------

                 Key: HIVE-860
                 URL: https://issues.apache.org/jira/browse/HIVE-860
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Zheng Shao


DistributedCache is shared across multiple jobs, if the hdfs file name is the 
same.

We need to make sure Hive put the same file into the same location every time 
and do not overwrite if the file content is the same.

We can achieve 2 different results:
A1. Files added with the same name, timestamp, and md5 in the same session will 
have a single copy in distributed cache.
A2. Filed added with the same name, timestamp, and md5 will have a single copy 
in distributed cache.

A2 has a bigger benefit in sharing but may raise a question on when Hive should 
clean it up in hdfs.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to