[
https://issues.apache.org/jira/browse/HADOOP-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588719#action_12588719
]
Hairong Kuang commented on HADOOP-3162:
---------------------------------------
I do not think that we need add the following two methods to FileInputFormat:
1. setInputPaths(JobConf conf, Sring commaSeparatedPaths)
2. addInputPaths(JobConf conf, String commaSeparatedPaths).
We have not discussed what a comma means in the user facing interfaces like
streaming if a user provides a comma separated path name. In streaming, should
we support commas in a path name? What if a user wants to use a glob that
contains a comma? These questions need to be well discussed and documented
before we make any code change to support it. We could do it in release 18.
In release 17, the patch only needs to revert the old behavior of addInputPath
and setInputPath of JobConf. Applications like streaming should continue to use
JobConf.setInputPath while a user that needs to use a glob or a path containing
commas can use the new APIs in FileInputFormat.
> Map/reduce stops working with comma separated input paths
> ---------------------------------------------------------
>
> Key: HADOOP-3162
> URL: https://issues.apache.org/jira/browse/HADOOP-3162
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.0
> Reporter: Runping Qi
> Assignee: Amareshwari Sriramadasu
> Priority: Blocker
> Fix For: 0.17.0
>
> Attachments: patch-3162.txt, patch-3162.txt, patch-3162.txt,
> patch-3162.txt, patch-3162.txt
>
>
> When a job is given a comma separated input file list, FileInputFormat class
> throws an exception, complaining the input is invalid:
> org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist :
> hdfs:/
> namenode:port/gridmix/data/MonsterQueryBlockCompressed/part-
> 00000,/gridmix/data/MonsterQueryBlockCompressed/part-00001,/gridmix/data/Monster
> QueryBlockCompressed/part-00002
> at
> org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputForma
> t.java:213)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:705)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
> at
> org.apache.hadoop.mapred.GenericMRLoadGenerator.run(GenericMRLoadGene
> rator.java:189)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.