Aman Sinha created DRILL-1906:
---------------------------------
Summary: Parquet reader error when reading a subdirectory
Key: DRILL-1906
URL: https://issues.apache.org/jira/browse/DRILL-1906
Project: Apache Drill
Issue Type: Bug
Reporter: Aman Sinha
I am not sure if this is a regression but on current master branch, Drill is
unable to read subdirectories if there are parquet files in the parent
directory and subdirectory. It's trying to read the footer for the
subdirectory itself instead of recursing below. JSON works fine.
For example, here's my directory structure:
{code}
ls -lR /tmp/foo1
-rw-r--r-- 1 asinha wheel 132 Dec 20 11:10 0_0_0.parquet
drwxr-xr-x 3 asinha wheel 102 Dec 20 09:54 foo2
/tmp/foo1/foo2:
-rw-r--r-- 1 asinha wheel 132 Dec 16 16:14 0_0_0.parquet
{code}
Here's the failure and stack trace:
{code}
0: jdbc:drill:zk=local> select * from foo1;
Query failed: Query failed: Unexpected exception during fragment
initialization: Internal error: Error while applying rule DrillTableRule, args
[rel#660:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[dfs, tmp, foo1])]
<skip>
Caused by: java.io.IOException: Could not read footer: java.io.IOException:
Could not read footer for file
DeprecatedRawLocalFileStatus{path=file:/tmp/foo1/foo2; isDirectory=true;
modifica
tion_time=1419098040000; access_time=0; owner=; group=; permission=rwxrwxrwx;
isSymlink=false}
at
parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:195)
~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
at
parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:208)
~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
at
parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:224)
~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
at
org.apache.drill.exec.store.parquet.ParquetGroupScan.readFooter(ParquetGroupScan.java:208)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)