[ 
https://issues.apache.org/jira/browse/DRILL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063113#comment-16063113
 ] 

Roman commented on DRILL-4595:
------------------------------

I tried steps from previous comment on Drill after DRILL-5599 fix and it seems 
the problem was solved.

After step 2) I got  
{code:sql}
Error: DATA_READ ERROR: Error reading page data

File:  /drill/testdata/tpcds_sf100/parquet/web_sales/0_0_7.parquet
Column:  ws_ship_hdemo_sk
Row Group Start:  7836969
Fragment 1:1

[Error Id: a8ca60e9-5ef7-42b7-b93a-fd7c69c06aef on node1:31010] (state=,code=0)
{code}

Drillbit was running (I did not get drillbit down) and table files were cleaned 
up. Also as I see in UI, the query correctly finished with FAILED state. 
Information from logs:

{code:sql}
2017-06-26 13:35:30,518 [26aef278-acd0-2649-502d-636b78c58f66:frag:1:1] INFO  
o.a.d.e.s.p.c.AsyncPageReader - User Error Occurred: Error reading page data 
(Failure allocating buffer.)
org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Error 
reading page data

File:  /drill/testdata/tpcds_sf100/parquet/web_sales/0_0_7.parquet
Column:  ws_ship_hdemo_sk
Row Group Start:  7836969

[Error Id: a8ca60e9-5ef7-42b7-b93a-fd7c69c06aef ]
        at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsyncPageReader.java:185)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.nextInternal(AsyncPageReader.java:273)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:307)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.processPages(NullableColumnReader.java:69)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFieldsSerial(BatchReader.java:63)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFields(BatchReader.java:56)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader$FixedWidthReader.readRecords(BatchReader.java:141)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:42)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:297)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:180) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:91)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) 
[na:1.8.0_131]
        at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_131]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
 [hadoop-common-2.7.0-mapr-1607.jar:na]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_131]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
allocating buffer.
        at 
io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:64)
 ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
        at 
org.apache.drill.exec.memory.AllocationManager.<init>(AllocationManager.java:80)
 ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:254)
 ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:236) 
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:206) 
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.PageReader.allocateTemporaryBuffer(PageReader.java:376)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.decompress(AsyncPageReader.java:195)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.getDecompressedPageData(AsyncPageReader.java:146)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.nextInternal(AsyncPageReader.java:268)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        ... 35 common frames omitted
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:694) ~[na:1.8.0_131]
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) 
~[na:1.8.0_131]
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
~[na:1.8.0_131]
        at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL(PooledByteBufAllocatorL.java:165)
 ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
        at 
io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer(PooledByteBufAllocatorL.java:195)
 ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
        at 
io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:62)
 ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
        ... 43 common frames omitted
{code}

So in this case we got out of memory and Drillbit cancelled the query 
correctly. It seems DRILL-5599 fixes all similar issues with CTAS. 

> FragmentExecutor.fail() should interrupt the fragment thread to avoid 
> possible query hangs
> ------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4595
>                 URL: https://issues.apache.org/jira/browse/DRILL-4595
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>             Fix For: Future
>
>
> When a fragment fails it's assumed it will be able to close itself and send 
> it's FAILED state to the foreman which will cancel any running fragments. 
> FragmentExecutor.cancel() will interrupt the thread making sure those 
> fragment don't stay blocked.
> However, if a fragment is already blocked when it's fail method is called the 
> foreman may never be notified about this and the query will hang forever. One 
> such scenario is the following:
> - generally it's a CTAS running on a large cluster (lot's of writers running 
> in parallel)
> - logs show that the user channel was closed and UserServer caused the root 
> fragment to move to a FAILED state
> - jstack shows that the root fragment is blocked in it's receiver waiting for 
> data
> - jstack also shows that ALL other fragments are no longer running, and the 
> logs show that all of them succeeded
> - the foreman waits *forever* for the root fragment to finish



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to