Persistent distributed cache ---------------------------- Key: HIVE-860 URL: https://issues.apache.org/jira/browse/HIVE-860 Project: Hadoop Hive Issue Type: Improvement Reporter: Zheng Shao
DistributedCache is shared across multiple jobs, if the hdfs file name is the same. We need to make sure Hive put the same file into the same location every time and do not overwrite if the file content is the same. We can achieve 2 different results: A1. Files added with the same name, timestamp, and md5 in the same session will have a single copy in distributed cache. A2. Filed added with the same name, timestamp, and md5 will have a single copy in distributed cache. A2 has a bigger benefit in sharing but may raise a question on when Hive should clean it up in hdfs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.