[
https://issues.apache.org/jira/browse/DRILL-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacques Nadeau updated DRILL-1906:
----------------------------------
Fix Version/s: (was: 1.0.0)
1.2.0
> Parquet reader error when reading a subdirectory
> ------------------------------------------------
>
> Key: DRILL-1906
> URL: https://issues.apache.org/jira/browse/DRILL-1906
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Reporter: Aman Sinha
> Assignee: Steven Phillips
> Fix For: 1.2.0
>
>
> I am not sure if this is a regression but on current master branch, Drill is
> unable to read subdirectories if there are parquet files in the parent
> directory and subdirectory. It's trying to read the footer for the
> subdirectory itself instead of recursing below. JSON works fine.
> For example, here's my directory structure:
> {code}
> ls -lR /tmp/foo1
> -rw-r--r-- 1 asinha wheel 132 Dec 20 11:10 0_0_0.parquet
> drwxr-xr-x 3 asinha wheel 102 Dec 20 09:54 foo2
> /tmp/foo1/foo2:
> -rw-r--r-- 1 asinha wheel 132 Dec 16 16:14 0_0_0.parquet
> {code}
> Here's the failure and stack trace:
> {code}
> 0: jdbc:drill:zk=local> select * from foo1;
> Query failed: Query failed: Unexpected exception during fragment
> initialization: Internal error: Error while applying rule DrillTableRule,
> args [rel#660:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[dfs, tmp,
> foo1])]
> <skip>
> Caused by: java.io.IOException: Could not read footer: java.io.IOException:
> Could not read footer for file
> DeprecatedRawLocalFileStatus{path=file:/tmp/foo1/foo2; isDirectory=true;
> modifica
> tion_time=1419098040000; access_time=0; owner=; group=; permission=rwxrwxrwx;
> isSymlink=false}
> at
> parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:195)
> ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
> at
> parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:208)
> ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
> at
> parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:224)
> ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
> at
> org.apache.drill.exec.store.parquet.ParquetGroupScan.readFooter(ParquetGroupScan.java:208)
> ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)