[ https://issues.apache.org/jira/browse/PIG-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-4409: ------------------------------- Attachment: PIG-4409-1.patch Uploading a patch that fixes the issue. > fs.defaultFS is overwritten in JobConf by replicated join at runtime > -------------------------------------------------------------------- > > Key: PIG-4409 > URL: https://issues.apache.org/jira/browse/PIG-4409 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.14.0 > Reporter: Cheolsoo Park > Assignee: Cheolsoo Park > Priority: Critical > Fix For: 0.15.0 > > Attachments: PIG-4409-1.patch > > > This is a regression of PIG-4257. > Pig accidentally overwrites {{fs.defaultFS}} in JobConf during the replicated > join at runtime. This can cause various side effects because udfs and > store/load funcs might depend on the value of {{fs.defaultFS}} at runtime. > Here is an example. I have a store func that does 2-phase commit to S3. Each > reducer writes output to local disk first and copies them to the final > destination on S3 during the task commit phase. Once it's done copying, > reducer writes a commit log to a hdfs location. During the job commit phase, > AM reads all the commit logs and update Hive metastore accordingly. > This store func stopped working in 0.14 when there is a replicate join in the > reduce phase. It is because {{fs.defaultFS}} is overwritten to local FS from > HDFS by replicated join at runtime. > The root cause is that PIG-4257 changed > {{ConfigurationUtil.getLocalFSProperties()}} to return a reference to JobConf > instead of a copy object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)