[
https://issues.apache.org/jira/browse/PIG-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419964#comment-13419964
]
Cheolsoo Park commented on PIG-2492:
------------------------------------
Attached [^PIG-2492-4.patch] is the newest patch.
There is one thing that I'd like to mention although I already discussed it in
the review board.
I changed the type of 1st parameter of AvroStorageUtils.getAllSubDirs() from
URI to hadoop.fs.Path. This is needed because '{' and '}' are not allowed in
URI, so URI.create() throws a URISyntaxException on a glob pattern that
contains those characters.
But these characters are automatically escaped when constructing a Path, so
what I did is constructing a Path with the given glob pattern string and
getting a URI from that Path by Path.toUri().
In fact, this reverts some changes made by PIG-2540
(https://issues.apache.org/jira/browse/PIG-2540). However, this does not break
S3 support because inside AvroStorageUtils.getAllSubDirs(), file system is
still constructed with the given URI, and globStatus() is called on that file
system.
{code}
FileSystem fs = FileSystem.get(path.toUri(), job.getConfiguration());
FileStatus[] matchedFiles = fs.globStatus(path);
{code}
So if path is a s3 URI, S3 file system will be used.
Please let me know if I am wrong. Thanks!
> AvroStorage should recognize globs and commas
> ---------------------------------------------
>
> Key: PIG-2492
> URL: https://issues.apache.org/jira/browse/PIG-2492
> Project: Pig
> Issue Type: Improvement
> Components: piggybank
> Affects Versions: 0.9.1, 0.10.0
> Reporter: Stan Rosenberg
> Assignee: Cheolsoo Park
> Attachments: AvroStorage.patch, AvroStorageUtils.patch,
> PIG-2492-2.patch, PIG-2492-3.patch, PIG-2492-4.patch, PIG-2492.patch,
> avro_test_files-2.tar.gz, avro_test_files.tar.gz
>
>
> I've patched AvroStorage and AvroStorageUtils to support the same file input
> syntax as currently supported
> by hadoop's FileInputFormat. Specifically, globs and commas are supported.
> Somebody should write some unit tests for theses changes; I am currently
> pressed for time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira