[ https://issues.apache.org/jira/browse/DRILL-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478840#comment-15478840 ]
ASF GitHub Bot commented on DRILL-3898: --------------------------------------- GitHub user Ben-Zvi opened a pull request: https://github.com/apache/drill/pull/585 DRILL-3898 : Sort spill was modified to catch all errors, ignore rep… …eated errors while closing the new group and issue a more detailed error message. Seems that the spilling IO can run into various kinds of errors (no space, failure to create a file,..) which are thrown as different exception classes. Hence changed the catch() statement to catch a more general Throwable , and add the exception's message for more detail (e.g., no disk space). Before the change the "no disk space" Throwable was not caught, and thus execution continued. Also the closing of the newGroup could hit some IO errors (e.g., when flushing), so a try/catch was added to ignore those. Note that this change should also fix DRILL-4542 ("if external sort fails to spill to disk, memory is leaked and wrong error message is displayed"). You can merge this pull request into a Git repository by running: $ git pull https://github.com/Ben-Zvi/drill DRILL-3898 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/585.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #585 ---- commit e988f1644be1d9fde24a489d94c7dbc54f8e82d8 Author: Boaz Ben-Zvi <b...@mapr.com> Date: 2016-09-09T23:36:03Z DRILL-3898 : Sort spill was modified to catch all errors, ignore repeated errors while closing the new group and issue a more detailed error message. ---- > No space error during external sort does not cancel the query > ------------------------------------------------------------- > > Key: DRILL-3898 > URL: https://issues.apache.org/jira/browse/DRILL-3898 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators > Affects Versions: 1.2.0, 1.8.0 > Reporter: Victoria Markman > Assignee: Boaz Ben-Zvi > Fix For: Future > > Attachments: drillbit.log, sqlline_3898.ver_1_8.log > > > While verifying DRILL-3732 I ran into a new problem. > I think drill somehow loses track of out of disk exception and does not > cancel rest of the query, which results in NPE: > Reproduction is the same as in DRILL-3732: > {code} > 0: jdbc:drill:schema=dfs> create table store_sales_20(ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, s_sold_date_sk, ss_promo_sk) > partition by (ss_promo_sk) as > . . . . . . . . . . . . > select > . . . . . . . . . . . . > case when columns[2] = '' then cast(null as > varchar(100)) else cast(columns[2] as varchar(100)) end, > . . . . . . . . . . . . > case when columns[3] = '' then cast(null as > varchar(100)) else cast(columns[3] as varchar(100)) end, > . . . . . . . . . . . . > case when columns[4] = '' then cast(null as > varchar(100)) else cast(columns[4] as varchar(100)) end, > . . . . . . . . . . . . > case when columns[5] = '' then cast(null as > varchar(100)) else cast(columns[5] as varchar(100)) end, > . . . . . . . . . . . . > case when columns[0] = '' then cast(null as > varchar(100)) else cast(columns[0] as varchar(100)) end, > . . . . . . . . . . . . > case when columns[8] = '' then cast(null as > varchar(100)) else cast(columns[8] as varchar(100)) end > . . . . . . . . . . . . > from > . . . . . . . . . . . . > `store_sales.dat` ss > . . . . . . . . . . . . > ; > Error: SYSTEM ERROR: NullPointerException > Fragment 1:16 > [Error Id: 0ae9338d-d04f-4b4a-93aa-a80d13cedb29 on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > This exception in drillbit.log should have triggered query cancellation: > {code} > 2015-10-06 17:01:34,463 [WorkManager-2] ERROR > o.apache.drill.exec.work.WorkManager - > org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception. > org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > ~[na:1.7.0_71] > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > ~[na:1.7.0_71] > at java.io.FilterOutputStream.close(FilterOutputStream.java:157) > ~[na:1.7.0_71] > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:400) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:152) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:44) > ~[drill-common-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:553) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:362) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:91) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at java.security.AccessController.doPrivileged(Native Method) > ~[na:1.7.0_71] > at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > ~[drill-common-1.2.0.jar:1.2.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_71] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_71] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > Caused by: java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.7.0_71] > at java.io.FileOutputStream.write(FileOutputStream.java:345) > ~[na:1.7.0_71] > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224) > ~[hadoop-common-2.5.1-mapr-1503.jar:na] > ... 45 common frames omitted > {code} > I'm attaching full drillbit.log -- This message was sent by Atlassian JIRA (v6.3.4#6332)