[jira] [Commented] (DRILL-5009) Query with a simple join fails on Hive generated parquet

Parth Chandra (JIRA) Tue, 08 Nov 2016 16:31:26 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649335#comment-15649335
 ]


Parth Chandra commented on DRILL-5009:
--------------------------------------

After discussing with Jinfeng, we concluded that we should filter out the row 
groups that are empty. I've fixed this for the Parquet metadata path, but still 
need to fix this  for the hive native reader path.
Additionally, I'll add a fix in the PageReader to handle this condition better.

> Query with a simple join fails on Hive generated parquet
> --------------------------------------------------------
>
>                 Key: DRILL-5009
>                 URL: https://issues.apache.org/jira/browse/DRILL-5009
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.9.0
>         Environment: Commit ID: 5a439424594eb10d113163eaa1fdf8034f387235c
> 1.9.0 SNAPSHOT - Nov 5 2016
>            Reporter: Abhishek Girish
>            Assignee: Parth Chandra
>            Priority: Blocker
>             Fix For: 1.9.0
>
>         Attachments: DRILL-5009.log.txt
>
>
> Query: 
> {code}
> SELECT *
> FROM store_sales ss, customer c
> WHERE  ss.ss_customer_sk = c.c_customer_sk 
> LIMIT 1; 
> {code}
> Error:
> {code}
> Error: SYSTEM ERROR: IOException: End of stream reached while initializing 
> buffered reader.
> Fragment 2:0
> [Error Id: 93726aea-1d62-4e7c-a2bf-1d7cc1e834e4 on abhi1:31010]
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet 
> record reader.
> ...
> ...
>  Caused By (org.apache.drill.common.exceptions.ExecutionSetupException) Error 
> opening or reading metadata for parquet file at location: customer.parquet
>     org.apache.drill.exec.store.parquet.columnreaders.PageReader.<init>():145
>     
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.<init>():59
>     org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.<init>():96
>     
> org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.<init>():39
>     
> org.apache.drill.exec.store.parquet.columnreaders.NullableFixedByteAlignedReaders$NullableFixedByteAlignedReader.<init>():58
>     
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory.getNullableColumnReader():252
>     
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory.createFixedColumnReader():186
>     
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.setup():402
>     org.apache.drill.exec.physical.impl.ScanBatch.next():212
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     
> org.apache.drill.exec.physical.impl.broadcastsender.BroadcastSenderRootExec.innerNext():95
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745
> ...
> {code}
> Log attached. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5009) Query with a simple join fails on Hive generated parquet

Reply via email to