Xuzhou Yin created PIG-5360:

             Summary: Pig sets working directory of input file systems causes 
exception thrown
                 Key: PIG-5360
                 URL: https://issues.apache.org/jira/browse/PIG-5360
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.17.0
            Reporter: Xuzhou Yin
             Fix For: 0.18.0

{color:#000000}In getSplits() method in PigInputFormat, Pig is trying to set 
the working directory of input File System to jobContext.getWorkingDirectory(), 
which is always the default working directory of default file system (eg. 
hdfs://host:port/user/userId in case of HDFS) unless 
“mapreduce.job.working.dir” is explicitly set to non-default value. So if the 
input path uses non-default file system (eg. EmrFS), then it will fail since it 
is trying to set the working directory of EmrFS to a HDFS path.{color}

{color:#000000}The proposed change it to completely remove this logic of 
setting working directory. There are several reasons for doing so. {color}

{color:#000000}Firstly, getSplits() is only supposed to return a list of input 
splits. It should not have side effects (especially doing so can potentially 
change the output path).{color}

{color:#000000}Secondly, there is inconsistency between the working directories 
of input and output file systems. if "mapreduce.job.working.dir" is set to 
non-default value, it will affect the output path only (if it is a relative 
path) because input path will be made qualified even before this logic.{color}

{color:#000000}Thirdly, there is already a "CD" functionality that allows 
customers to change the working directory. However, this logic will overwrite 
the "CD" functionality if input and output paths both use default file 

This message was sent by Atlassian JIRA

Reply via email to