Thanks Hairong,
I've just created https://issues.apache.org/jira/browse/HADOOP-3064 for this.
Tom
On 20/03/2008, Hairong Kuang <[EMAIL PROTECTED]> wrote:
> Yes, this is a bug. This only occurs when a job's input path contains the
> closures. JobConf.getInputPaths interprets mr/input/glob/2008/02/{02.08} as
> two input paths: mr/input/glob/2008/02/{02 and 08}. Let's see how to fix it.
>
>
> Hairong
>
>
>
> On 3/20/08 9:43 AM, "Tom White" <[EMAIL PROTECTED]> wrote:
>
> > I'm trying to use file globbing to select various input paths, like so:
> >
> > conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
> >
> > But this gives an exception:
> >
> > Exception in thread "main" java.io.IOException: Illegal file pattern:
> > Expecting set closure character or end of range, or } for glob {02 at
> > 3
> > at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
> > at
> org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
> > at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
> > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
> > at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
> > at
> >
> org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:13
> > 1)
> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)
> >
> > Looking at the code for JobConf.getInputPaths I see it tokenizes using
> > a comma as the delimiter, producing two paths
> > "mr/input/glob/2008/02/{02" and "08}". This looks like a bug to me.
> > I'm surprised as this feature has been around for some time - are
> > folks not using it like this?
> >
> > Tom
>
>
--
Blog: http://www.lexemetech.com/