[jira] [Created] (DRILL-1906) Parquet reader error when reading a subdirectory

Aman Sinha (JIRA) Sat, 20 Dec 2014 12:16:22 -0800

Aman Sinha created DRILL-1906:
---------------------------------

             Summary: Parquet reader error when reading a subdirectory
                 Key: DRILL-1906
                 URL: https://issues.apache.org/jira/browse/DRILL-1906
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Aman Sinha



I am not sure if this is a regression but on current master branch, Drill is 
unable to read subdirectories if there are parquet files in the parent 
directory and subdirectory.  It's trying to read the footer for the 
subdirectory itself instead of recursing below.   JSON works fine.  

For example, here's my directory structure: 

{code}
 ls -lR /tmp/foo1
-rw-r--r--  1 asinha  wheel  132 Dec 20 11:10 0_0_0.parquet
drwxr-xr-x  3 asinha  wheel  102 Dec 20 09:54 foo2

/tmp/foo1/foo2:
-rw-r--r--  1 asinha  wheel  132 Dec 16 16:14 0_0_0.parquet
{code}

Here's the failure and stack trace: 
{code}
0: jdbc:drill:zk=local> select * from foo1;
Query failed: Query failed: Unexpected exception during fragment 
initialization: Internal error: Error while applying rule DrillTableRule, args 
[rel#660:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[dfs, tmp, foo1])]

<skip>
Caused by: java.io.IOException: Could not read footer: java.io.IOException: 
Could not read footer for file 
DeprecatedRawLocalFileStatus{path=file:/tmp/foo1/foo2; isDirectory=true; 
modifica
tion_time=1419098040000; access_time=0; owner=; group=; permission=rwxrwxrwx; 
isSymlink=false}
        at 
parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:195)
 ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
        at 
parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:208)
 ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
        at 
parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:224) 
~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.readFooter(ParquetGroupScan.java:208)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-1906) Parquet reader error when reading a subdirectory

Reply via email to