[
https://issues.apache.org/jira/browse/PIG-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheolsoo Park updated PIG-4409:
-------------------------------
Attachment: PIG-4409-1.patch
Uploading a patch that fixes the issue.
> fs.defaultFS is overwritten in JobConf by replicated join at runtime
> --------------------------------------------------------------------
>
> Key: PIG-4409
> URL: https://issues.apache.org/jira/browse/PIG-4409
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.14.0
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Priority: Critical
> Fix For: 0.15.0
>
> Attachments: PIG-4409-1.patch
>
>
> This is a regression of PIG-4257.
> Pig accidentally overwrites {{fs.defaultFS}} in JobConf during the replicated
> join at runtime. This can cause various side effects because udfs and
> store/load funcs might depend on the value of {{fs.defaultFS}} at runtime.
> Here is an example. I have a store func that does 2-phase commit to S3. Each
> reducer writes output to local disk first and copies them to the final
> destination on S3 during the task commit phase. Once it's done copying,
> reducer writes a commit log to a hdfs location. During the job commit phase,
> AM reads all the commit logs and update Hive metastore accordingly.
> This store func stopped working in 0.14 when there is a replicate join in the
> reduce phase. It is because {{fs.defaultFS}} is overwritten to local FS from
> HDFS by replicated join at runtime.
> The root cause is that PIG-4257 changed
> {{ConfigurationUtil.getLocalFSProperties()}} to return a reference to JobConf
> instead of a copy object.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)