[
https://issues.apache.org/jira/browse/DRILL-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268874#comment-14268874
]
Adam Gilmore commented on DRILL-1948:
-------------------------------------
Seemed to have worked out the cause. This line is the ultimate culprit:
CompatibilityUtil.getBuf(input, directBuffer, pageLength);
which ends up doing an input.read(directBuffer) (I couldn't work out where the
source for CompatibilityUtil is)
The fatal mistake that CompatibilityUtil makes, is that it assumes
input.read(ByteBuffer) will always read the remaining bytes in the buffer. For
HDFS, this is not always the case. In my instance, it only reads chunks of
64kb (65,535) at a time, thus for large Parquet files, it's requesting pages of
128kb or so, and only reading 64kb of them.
This compounds by only pushing the position in the stream down to 65,535 on the
first page read, which then lands in the middle of a page and tries to read the
page header, hence the error.
There is probably remedy to force HDFS to return larger chunks, but I'm not
quite sure what setting would do that. The real fix is to loop input.read()
until it returns 0.
> Reading large parquet files via HDFS fails
> ------------------------------------------
>
> Key: DRILL-1948
> URL: https://issues.apache.org/jira/browse/DRILL-1948
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 0.7.0
> Environment: Hadoop 2.4.0 on Amazon EMR
> Reporter: Adam Gilmore
> Assignee: Parth Chandra
> Priority: Critical
>
> There appears to be an issue with reading medium to large Parquet files via
> HDFS. We have created a basic Parquet file in with a schema like so:
> sellprice DOUBLE
> When filled with 10,000 double values, the following query in Drill works
> fine:
> select sum(sellprice) from hdfs.`/saleparquet`;
> When filled with 50,000 double values, the following error occurs:
> Query failed: Query stopped.[ 9aece851-48bc-4664-831e-d35bbfbcd1d5 on
> ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing
> query.
> The full stack trace is:
> 2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR
> o.a.drill.exec.ops.FragmentContext - Fragment Context received failure.
> java.lang.ArrayIndexOutOfBoundsException: null
> 2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR
> o.a.d.e.p.i.ScreenCreator$ScreenRoot - Error
> 88fe95c3-b088-4674-8b65-967a7f4c3cdf: Query stopped.
> java.lang.ArrayIndexOutOfBoundsException: null
> 2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR
> o.a.d.e.w.f.AbstractStatusReporter - Error
> cd4123e4-7b9d-451d-90f0-3cc1ecf461e4: Failure while running fragment.
> java.lang.ArrayIndexOutOfBoundsException: null
> 2015-01-07 05:48:57,813 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR
> o.a.drill.exec.work.foreman.Foreman - Error
> 5db2c65b-cd10-4970-ba2b-f29b51fda923: Query failed: Failure while running
> fragment.[ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on
> ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
> [ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on
> ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
> org.apache.drill.exec.rpc.RemoteRpcException: Failure while running
> fragment.[ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on
> ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
> [ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on
> ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
> at
> org.apache.drill.exec.work.foreman.QueryManager.statusUpdate(QueryManager.java:93)
> [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:151)
> [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:113)
> [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:109)
> [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.internalFail(FragmentExecutor.java:166)
> [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:116)
> [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
> [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_71]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> 2015-01-07 05:48:57,814 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] WARN
> o.a.d.e.p.impl.SendingAccountor - Failure while waiting for send complete.
> java.lang.InterruptedException: null
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301)
> ~[na:1.7.0_71]
> at java.util.concurrent.Semaphore.acquire(Semaphore.java:472)
> ~[na:1.7.0_71]
> at
> org.apache.drill.exec.physical.impl.SendingAccountor.waitForSendComplete(SendingAccountor.java:44)
> ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.stop(ScreenCreator.java:186)
> [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:144)
> [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:117)
> [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
> [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_71]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> If I fill with even more values (e.g. 100,000 or 1,000,000) - I get a variety
> of other errors, such as:
> "Query failed: Query stopped., don't know what type: 14"
> coming from the Parquet engine.
> I am able to consistently replicate this in my environment with a basic
> Parquet file. I can attach that file if necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)