Cheolsoo Park created PIG-4409:
----------------------------------

             Summary: fs.defaultFS is overwritten in JobConf by replicated join 
at runtime
                 Key: PIG-4409
                 URL: https://issues.apache.org/jira/browse/PIG-4409
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.14.0
            Reporter: Cheolsoo Park
            Assignee: Cheolsoo Park
            Priority: Critical
             Fix For: 0.15.0


This is a regression of PIG-4257.

Pig accidentally overwrites {{fs.defaultFS}} in JobConf during the replicated 
join at runtime. This can cause various side effects because udfs and 
store/load funcs might depend on the value of {{fs.defaultFS}} at runtime.

Here is an example. I have a store func that does 2-phase commit to S3. Each 
reducer writes output to local disk first and copies them to the final 
destination on S3 during the task commit phase. Once it's done copying, reducer 
writes a commit log to a hdfs location. During the job commit phase, AM reads 
all the commit logs and update Hive metastore accordingly.

This store func stopped working in 0.14 when there is a replicate join in the 
reduce phase. It is because {{fs.defaultFS}} is overwritten to local FS from 
HDFS by replicated join at runtime.

The root cause is that PIG-4257 changed 
{{ConfigurationUtil.getLocalFSProperties()}} to return a reference to JobConf 
instead of a copy object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to