[
https://issues.apache.org/jira/browse/PIG-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648078#comment-13648078
]
Johnny Zhang commented on PIG-3223:
-----------------------------------
[~rohini], could you please review the latest patch
https://issues.apache.org/jira/secure/attachment/12581645/PIG-3223.patch.txt ?
new added test cases in TestAvroStorage is also clean. Please let me know any
concern regarding to the implementation, I will revised it as soon as possible!
Let me know if you want me post another patch for 0.11 branch too!
I also tried Viraj's patch 'PIG-3223.viraj.txt', it not clean on trunk
{noformat}
patching file
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
patching file
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
Hunk #2 succeeded at 45 (offset 1 line).
Hunk #3 succeeded at 72 (offset 1 line).
Hunk #4 succeeded at 91 with fuzz 1 (offset 1 line).
Hunk #5 succeeded at 1005 (offset 19 lines).
{noformat}
also the TestAvroStorage test failed
{noformat}
<error message="Error during parsing. java.net.URISyntaxException: Illegal
character in scheme name at index 4: test_glob1.avro,file:"
type="org.apache.pig.impl.logicalLayer.FrontendException">org.apache.pig.impl.logicalLayer.FrontendException:
ERROR 1000: Error during parsing. java.net.URISyntaxException: Illegal
character in scheme name at index 4: test_glob1.avro,file:
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1670)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1608)
at org.apache.pig.PigServer.registerQuery(PigServer.java:565)
at org.apache.pig.PigServer.registerQuery(PigServer.java:578)
at
org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testAvroStorage(TestAvroStorage.java:1058)
at
org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testAvroStorage(TestAvroStorage.java:1051)
at
org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testComma1(TestAvroStorage.java:1020)
Caused by: Failed to parse: java.net.URISyntaxException: Illegal character in
scheme name at index 4: test_glob1.avro,file:
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1661)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException:
Illegal character in scheme name at index 4: test_glob1.avro,file:
at org.apache.hadoop.fs.Path.initialize(Path.java:148)
at org.apache.hadoop.fs.Path.<init>(Path.java:126)
at org.apache.hadoop.fs.Path.<init>(Path.java:50)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1084)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at
org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1023)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:987)
at
org.apache.pig.piggybank.storage.avro.AvroStorageUtils.getAllSubDirs(AvroStorageUtils.java:120)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:387)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174)
at
org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:88)
at
org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:856)
at
org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3256)
at
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1335)
at
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:819)
at
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:537)
at
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:412)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:181)
Caused by: java.net.URISyntaxException: Illegal character in scheme name at
index 4: test_glob1.avro,file:
at java.net.URI$Parser.fail(URI.java:2809)
at java.net.URI$Parser.checkChars(URI.java:2982)
at java.net.URI$Parser.parse(URI.java:3009)
at java.net.URI.<init>(URI.java:736)
at org.apache.hadoop.fs.Path.initialize(Path.java:145)
{noformat}
> AvroStorage does not handle comma separated input paths
> -------------------------------------------------------
>
> Key: PIG-3223
> URL: https://issues.apache.org/jira/browse/PIG-3223
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.10.0, 0.11
> Reporter: Michael Kramer
> Assignee: Johnny Zhang
> Attachments: AvroStorage.patch, AvroStorage.patch-2,
> AvroStorageUtils.patch, AvroStorageUtils.patch-2, PIG-3223.patch.txt,
> PIG-3223.patch.txt, PIG-3223.patch.txt, PIG-3223.patch.txt, PIG-3223.viraj.txt
>
>
> In pig 0.11, a patch was issued to AvroStorage to support globs and comma
> separated input paths (PIG-2492). While this function works fine for
> glob-formatted input paths, it fails when issued a standard comma separated
> list of paths. fs.globStatus does not seem to be able to parse out such a
> list, and a java.net.URISyntaxException is thrown when toURI is called on the
> path.
> I have a working fix for this, but it's extremely ugly (basically checking if
> the string of input paths is globbed, otherwise splitting on ","). I'm sure
> there's a more elegant solution. I'd be happy to post the relevant methods
> and "fixes" if necessary.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira