Cheolsoo Park created PIG-4409:
----------------------------------
Summary: fs.defaultFS is overwritten in JobConf by replicated join
at runtime
Key: PIG-4409
URL: https://issues.apache.org/jira/browse/PIG-4409
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.14.0
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Priority: Critical
Fix For: 0.15.0
This is a regression of PIG-4257.
Pig accidentally overwrites {{fs.defaultFS}} in JobConf during the replicated
join at runtime. This can cause various side effects because udfs and
store/load funcs might depend on the value of {{fs.defaultFS}} at runtime.
Here is an example. I have a store func that does 2-phase commit to S3. Each
reducer writes output to local disk first and copies them to the final
destination on S3 during the task commit phase. Once it's done copying, reducer
writes a commit log to a hdfs location. During the job commit phase, AM reads
all the commit logs and update Hive metastore accordingly.
This store func stopped working in 0.14 when there is a replicate join in the
reduce phase. It is because {{fs.defaultFS}} is overwritten to local FS from
HDFS by replicated join at runtime.
The root cause is that PIG-4257 changed
{{ConfigurationUtil.getLocalFSProperties()}} to return a reference to JobConf
instead of a copy object.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)