[jira] [Updated] (DRILL-1906) Parquet reader error when reading a subdirectory

Steven Phillips (JIRA) Mon, 23 Feb 2015 23:42:58 -0800

     [ 
https://issues.apache.org/jira/browse/DRILL-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steven Phillips updated DRILL-1906:
-----------------------------------
    Fix Version/s:     (was: 0.8.0)

> Parquet reader error when reading a subdirectory
> ------------------------------------------------
>
>                 Key: DRILL-1906
>                 URL: https://issues.apache.org/jira/browse/DRILL-1906
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Aman Sinha
>            Assignee: Steven Phillips
>             Fix For: 0.9.0
>
>
> I am not sure if this is a regression but on current master branch, Drill is 
> unable to read subdirectories if there are parquet files in the parent 
> directory and subdirectory.  It's trying to read the footer for the 
> subdirectory itself instead of recursing below.   JSON works fine.  
> For example, here's my directory structure: 
> {code}
>  ls -lR /tmp/foo1
> -rw-r--r--  1 asinha  wheel  132 Dec 20 11:10 0_0_0.parquet
> drwxr-xr-x  3 asinha  wheel  102 Dec 20 09:54 foo2
> /tmp/foo1/foo2:
> -rw-r--r--  1 asinha  wheel  132 Dec 16 16:14 0_0_0.parquet
> {code}
> Here's the failure and stack trace: 
> {code}
> 0: jdbc:drill:zk=local> select * from foo1;
> Query failed: Query failed: Unexpected exception during fragment 
> initialization: Internal error: Error while applying rule DrillTableRule, 
> args [rel#660:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[dfs, tmp, 
> foo1])]
> <skip>
> Caused by: java.io.IOException: Could not read footer: java.io.IOException: 
> Could not read footer for file 
> DeprecatedRawLocalFileStatus{path=file:/tmp/foo1/foo2; isDirectory=true; 
> modifica
> tion_time=1419098040000; access_time=0; owner=; group=; permission=rwxrwxrwx; 
> isSymlink=false}
>         at 
> parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:195)
>  ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
>         at 
> parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:208)
>  ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
>         at 
> parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:224) 
> ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.readFooter(ParquetGroupScan.java:208)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-1906) Parquet reader error when reading a subdirectory

Reply via email to