[ 
https://issues.apache.org/jira/browse/PIG-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603652#comment-13603652
 ] 

Michael Kramer commented on PIG-3223:
-------------------------------------

[~cheolsoo], thanks for getting back to me so quickly!  

We're using variable substitution and input path generation via Oozie 
Coordinator.  We include the hdfs://namenode:8020 at the beginning of our path 
templates, which I think is pretty standard (e.g. something like 
<uri-template>$\{nameNode\}/data/</uri-template> )  When Oozie constructs input 
paths to be passed to the pig script or map reduce job, it enumerates the paths 
via a comma separated list, something like  
hdfs://namenode:8020/data/1,hdfs://namenode:8020/data/2.  This is how we 
figured out AvroStorage was breaking in the first place.  

A good coordinator/workflow example that is indicative of the types of 
workflows we're running can be found in the Oozie source examples: 
https://github.com/apache/oozie/blob/trunk/examples/src/main/apps/aggregator/coordinator.xml
                
> AvroStorage does not handle comma separated input paths
> -------------------------------------------------------
>
>                 Key: PIG-3223
>                 URL: https://issues.apache.org/jira/browse/PIG-3223
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.10.0, 0.11
>            Reporter: Michael Kramer
>            Assignee: Johnny Zhang
>         Attachments: AvroStorage.patch, AvroStorage.patch-2, 
> AvroStorageUtils.patch, AvroStorageUtils.patch-2, PIG-3223.patch.txt
>
>
> In pig 0.11, a patch was issued to AvroStorage to support globs and comma 
> separated input paths (PIG-2492).  While this function works fine for 
> glob-formatted input paths, it fails when issued a standard comma separated 
> list of paths.  fs.globStatus does not seem to be able to parse out such a 
> list, and a java.net.URISyntaxException is thrown when toURI is called on the 
> path.  
> I have a working fix for this, but it's extremely ugly (basically checking if 
> the string of input paths is globbed, otherwise splitting on ",").  I'm sure 
> there's a more elegant solution.  I'd be happy to post the relevant methods 
> and "fixes" if necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to