[ 
https://issues.apache.org/jira/browse/PIG-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949583#comment-13949583
 ] 

Cheolsoo Park commented on PIG-3830:
------------------------------------

[~jarcec], is this ready for review? If so, please mark it as patch available.

Regarding the formatting, feel free to clean up the white spaces. As long as 
the patch is uploaded to the RB, it's not hard to review (at least for me).

> HiveColumnarLoader throwing FileNotFoundException on Hadoop 2
> -------------------------------------------------------------
>
>                 Key: PIG-3830
>                 URL: https://issues.apache.org/jira/browse/PIG-3830
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>             Fix For: 0.13.0
>
>         Attachments: PIG-3830.patch
>
>
> I've noticed that {{HiveColumnarLoader}} will thrown 
> {{java.io.FileNotFoundException}} when used with glob path on Hadoop 2.0. It 
> will run just fine on Hadoop 1.0:
> {code}
> Failed to parse: java.io.FileNotFoundException: File 
> /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt
>  does not exist
>       at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198)
>       at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1676)
>       at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1623)
>       at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
>       at org.apache.pig.PigServer.registerQuery(PigServer.java:588)
>       at 
> org.apache.pig.piggybank.test.storage.TestHiveColumnarLoader.testHdfdsGlobbing(TestHiveColumnarLoader.java:220)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:601)
>       at junit.framework.TestCase.runTest(TestCase.java:176)
>       at junit.framework.TestCase.runBare(TestCase.java:141)
>       at junit.framework.TestResult$1.protect(TestResult.java:122)
>       at junit.framework.TestResult.runProtected(TestResult.java:142)
>       at junit.framework.TestResult.run(TestResult.java:125)
>       at junit.framework.TestCase.run(TestCase.java:129)
>       at junit.framework.TestSuite.runTest(TestSuite.java:255)
>       at junit.framework.TestSuite.run(TestSuite.java:250)
>       at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
>       at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
>       at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File 
> /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt
>  does not exist
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:362)
>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1484)
>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1524)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564)
>       at 
> org.apache.pig.piggybank.storage.partition.PathPartitioner.getPartitionKeys(PathPartitioner.java:105)
>       at 
> org.apache.pig.piggybank.storage.partition.PathPartitionHelper.getPartitionKeys(PathPartitionHelper.java:101)
>       at 
> org.apache.pig.piggybank.storage.HiveColumnarLoader.getPartitionColumns(HiveColumnarLoader.java:576)
>       at 
> org.apache.pig.piggybank.storage.HiveColumnarLoader.getSchema(HiveColumnarLoader.java:646)
>       at 
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
>       at 
> org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
>       at 
> org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:853)
>       at 
> org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3479)
>       at 
> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1536)
>       at 
> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1013)
>       at 
> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:553)
>       at 
> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
>       at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
>       ... 20 more
> Caused by: java.io.FileNotFoundException: File 
> /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt
>  does not exist
>       ... 37 more
> {code}
> I've dived into the problem and found a difference in Hadoop implementation 
> of {{DistributedFileSystem}}. For non existing directory method 
> {{listStatus}} will return {{null}} in [Hadoop 
> 1|https://github.com/apache/hadoop-common/blob/branch-1/src/hdfs/org/apache/hadoop/hdfs/DistributedFileSystem.java#L316]:
> {code}
>     if (thisListing == null) { // the directory does not exist
>       return null;
>     }
> {code}
> But will thrown an exception in [Hadoop 
> 2|https://github.com/apache/hadoop-common/blob/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L653]:
> {code}
>     if (thisListing == null) { // the directory does not exist
>       throw new FileNotFoundException("File " + p + " does not exist.");
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to