[ 
https://issues.apache.org/jira/browse/DRILL-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497450#comment-15497450
 ] 

ASF GitHub Bot commented on DRILL-3898:
---------------------------------------

Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/585#discussion_r79255636
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
    @@ -592,11 +592,14 @@ public BatchGroup 
mergeAndSpill(LinkedList<BatchGroup> batchGroups) throws Schem
           }
           injector.injectChecked(context.getExecutionControls(), 
INTERRUPTION_WHILE_SPILLING, IOException.class);
           newGroup.closeOutputStream();
    -    } catch (Exception e) {
    +    } catch (Throwable e) {
           // we only need to cleanup newGroup if spill failed
    -      AutoCloseables.close(e, newGroup);
    +      try {
    +        AutoCloseables.close(e, newGroup);
    +      } catch (Throwable t) { /* close() may hit the same IO issue; just 
ignore */ }
    --- End diff --
    
    In the case of no disk space to spill, close() tries to cleanup by calling 
flushBuffer() which eventually throws the same exception as there's still no 
space:
    
    at java.io.FileOutputStream.write(FileOutputStream.java:326)
          at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:246)
          at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
          at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
          - locked <0x24e5> (a java.io.BufferedOutputStream)
          at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
          at java.io.DataOutputStream.write(DataOutputStream.java:107)
          - locked <0x24e7> (a org.apache.hadoop.fs.FSDataOutputStream)
          at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:419)
          at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:206)
          at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:163)
          - locked <0x24e8> (a 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer)
          at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:144)
          at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:407)
          at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
          at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
          at 
org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:169)
          at 
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76)
          at 
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:53)
          at 
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:43)
          at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:598)



> No space error during external sort does not cancel the query
> -------------------------------------------------------------
>
>                 Key: DRILL-3898
>                 URL: https://issues.apache.org/jira/browse/DRILL-3898
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.2.0, 1.8.0
>            Reporter: Victoria Markman
>            Assignee: Boaz Ben-Zvi
>             Fix For: Future
>
>         Attachments: drillbit.log, sqlline_3898.ver_1_8.log
>
>
> While verifying DRILL-3732 I ran into a new problem.
> I think drill somehow loses track of out of disk exception and does not 
> cancel rest of the query, which results in NPE:
> Reproduction is the same as in DRILL-3732:
> {code}
> 0: jdbc:drill:schema=dfs> create table store_sales_20(ss_item_sk, 
> ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, s_sold_date_sk, ss_promo_sk) 
> partition by (ss_promo_sk) as
> . . . . . . . . . . . . >  select 
> . . . . . . . . . . . . >      case when columns[2] = '' then cast(null as 
> varchar(100)) else cast(columns[2] as varchar(100)) end,
> . . . . . . . . . . . . >      case when columns[3] = '' then cast(null as 
> varchar(100)) else cast(columns[3] as varchar(100)) end,
> . . . . . . . . . . . . >      case when columns[4] = '' then cast(null as 
> varchar(100)) else cast(columns[4] as varchar(100)) end, 
> . . . . . . . . . . . . >      case when columns[5] = '' then cast(null as 
> varchar(100)) else cast(columns[5] as varchar(100)) end, 
> . . . . . . . . . . . . >      case when columns[0] = '' then cast(null as 
> varchar(100)) else cast(columns[0] as varchar(100)) end, 
> . . . . . . . . . . . . >      case when columns[8] = '' then cast(null as 
> varchar(100)) else cast(columns[8] as varchar(100)) end
> . . . . . . . . . . . . >  from 
> . . . . . . . . . . . . >           `store_sales.dat` ss     
> . . . . . . . . . . . . > ;
> Error: SYSTEM ERROR: NullPointerException
> Fragment 1:16
> [Error Id: 0ae9338d-d04f-4b4a-93aa-a80d13cedb29 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> This exception in drillbit.log should have triggered query cancellation:
> {code}
> 2015-10-06 17:01:34,463 [WorkManager-2] ERROR 
> o.apache.drill.exec.work.WorkManager - 
> org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
> org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> ~[na:1.7.0_71]
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
> ~[na:1.7.0_71]
>         at java.io.FilterOutputStream.close(FilterOutputStream.java:157) 
> ~[na:1.7.0_71]
>         at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
> ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:400)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
> ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at 
> org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:152)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:44) 
> ~[drill-common-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:553)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:362)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:91)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) 
> ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) 
> ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at java.security.AccessController.doPrivileged(Native Method) 
> ~[na:1.7.0_71]
>         at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71]
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  ~[drill-common-1.2.0.jar:1.2.0]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_71]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.7.0_71]
>         at java.io.FileOutputStream.write(FileOutputStream.java:345) 
> ~[na:1.7.0_71]
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         ... 45 common frames omitted
> {code}
> I'm attaching full drillbit.log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to