[
https://issues.apache.org/jira/browse/PIG-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603652#comment-13603652
]
Michael Kramer commented on PIG-3223:
-------------------------------------
[~cheolsoo], thanks for getting back to me so quickly!
We're using variable substitution and input path generation via Oozie
Coordinator. We include the hdfs://namenode:8020 at the beginning of our path
templates, which I think is pretty standard (e.g. something like
<uri-template>$\{nameNode\}/data/</uri-template> ) When Oozie constructs input
paths to be passed to the pig script or map reduce job, it enumerates the paths
via a comma separated list, something like
hdfs://namenode:8020/data/1,hdfs://namenode:8020/data/2. This is how we
figured out AvroStorage was breaking in the first place.
A good coordinator/workflow example that is indicative of the types of
workflows we're running can be found in the Oozie source examples:
https://github.com/apache/oozie/blob/trunk/examples/src/main/apps/aggregator/coordinator.xml
> AvroStorage does not handle comma separated input paths
> -------------------------------------------------------
>
> Key: PIG-3223
> URL: https://issues.apache.org/jira/browse/PIG-3223
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.10.0, 0.11
> Reporter: Michael Kramer
> Assignee: Johnny Zhang
> Attachments: AvroStorage.patch, AvroStorage.patch-2,
> AvroStorageUtils.patch, AvroStorageUtils.patch-2, PIG-3223.patch.txt
>
>
> In pig 0.11, a patch was issued to AvroStorage to support globs and comma
> separated input paths (PIG-2492). While this function works fine for
> glob-formatted input paths, it fails when issued a standard comma separated
> list of paths. fs.globStatus does not seem to be able to parse out such a
> list, and a java.net.URISyntaxException is thrown when toURI is called on the
> path.
> I have a working fix for this, but it's extremely ugly (basically checking if
> the string of input paths is globbed, otherwise splitting on ","). I'm sure
> there's a more elegant solution. I'd be happy to post the relevant methods
> and "fixes" if necessary.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira