[
https://issues.apache.org/jira/browse/CRUNCH-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriel Reid updated CRUNCH-344:
--------------------------------
Attachment: CRUNCH-344.patch
Patch to resolve the issue. URL encoding is used to serialize path information
in the Configuration. I went for URL encoding instead of base64 to make it
easier to debug the Configuration if issues do pop up at some point later.
> Full file glob syntax does not work correctly with Crunch
> ---------------------------------------------------------
>
> Key: CRUNCH-344
> URL: https://issues.apache.org/jira/browse/CRUNCH-344
> Project: Crunch
> Issue Type: Bug
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Attachments: CRUNCH-344.patch
>
>
> Using an input path with some variants of Hadoop-supported glob syntax does
> not work. This is specifically an issue when commas are used in a glob path,
> for example, a path like "/input/file{1,2,3}.txt". The same underlying cause
> also makes it impossible to use (admittedly much less common) paths that
> contain semicolons or pipe symbols.
> The underlying cause is the encoding used in CrunchInputs, which builds a
> string structure using commas, pipes, and semicolons as field separators.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)