[
https://issues.apache.org/jira/browse/PIG-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602921#comment-13602921
]
Michael Kramer commented on PIG-3223:
-------------------------------------
The problem is when we're dealing with comma separated paths that aren't
enclosed by {}. Globs put restrictions on fully qualified hdfs paths. If we're
dealing with non-globbed input paths, which PigStorage does handle, this
function breaks, i.e if I passed testdir1/,testdir2/ as input, it would fail.
It also fails with something like
{hdfs://namenode:8020/testdir1/,hdfs://namenode:8020/testdir2}/*
> AvroStorage does not handle comma separated input paths
> -------------------------------------------------------
>
> Key: PIG-3223
> URL: https://issues.apache.org/jira/browse/PIG-3223
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.10.0, 0.11
> Reporter: Michael Kramer
> Assignee: Johnny Zhang
> Attachments: AvroStorage.patch, AvroStorage.patch-2,
> AvroStorageUtils.patch, AvroStorageUtils.patch-2, PIG-3223.patch.txt
>
>
> In pig 0.11, a patch was issued to AvroStorage to support globs and comma
> separated input paths (PIG-2492). While this function works fine for
> glob-formatted input paths, it fails when issued a standard comma separated
> list of paths. fs.globStatus does not seem to be able to parse out such a
> list, and a java.net.URISyntaxException is thrown when toURI is called on the
> path.
> I have a working fix for this, but it's extremely ugly (basically checking if
> the string of input paths is globbed, otherwise splitting on ","). I'm sure
> there's a more elegant solution. I'd be happy to post the relevant methods
> and "fixes" if necessary.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira