[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162253#comment-16162253
 ] 

Paul Rogers commented on DRILL-5670:
------------------------------------

The above exception comes from the merging receiver which is in the major 
fragment above the sort. This *probably* means that the sort completed and 
started sending batches downstream to the next major fragment.

The error occurs because the merging receiver is attempting to allocate a 
single vector greater than 2 GB in size (the physical maximum for any vector.) 
Since this query has a varchar array of 3500 elements per row, it is probably 
this repeated var char vector that is causing the issue.

Sort itself tries to create output batches of 16 MB in size. (The log will 
indicate the target output batch size and resulting number of rows.) It is 
unclear why the merging receiver has decided to combine these to produce huge 
batches. (The call to `copyFromSafe` suggests that the operator is combining 
batches.)

This is a new bug for which a Jira should be filed since it is a bug in the 
exchanges, rather than the sort.

Note that, even if we could allocate a 2 GB vector, we would not want to as 
doing so will heavily fragment Drill's direct memory and will eventually cause 
an OOM due to memory fragmentation.

> Varchar vector throws an assertion error when allocating a new vector
> ---------------------------------------------------------------------
>
>                 Key: DRILL-5670
>                 URL: https://issues.apache.org/jira/browse/DRILL-5670
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.11.0
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>         Attachments: 26498995-bbad-83bc-618f-914c37a84e1f.sys.drill, 
> 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log, drillbit.log, drillbit.log, drillbit.out, drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> [email protected]
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> DRILL-5522\: OOM during the merge and spill process of the managed external 
> sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
> external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
> vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to 
> initialize the offset vector\n\nAll of the bugs have to do with handling 
> low-memory conditions, and with\ncorrectly estimating the sizes of vectors, 
> even when those vectors come\nfrom the spill file or from an exchange. Hence, 
> the changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
> improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
> [email protected]
> git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.time=05.07.2017 @ 20\:34\:39 PDT
> git.build.time=12.07.2017 @ 14\:27\:03 PDT
> [email protected]\:paul-rogers/drill.git
> {code}
> Below query fails with an Assertion Error
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +-------+-------------------------------------+
> |  ok   |               summary               |
> +-------+-------------------------------------+
> | true  | exec.sort.disable_managed updated.  |
> +-------+-------------------------------------+
> 1 row selected (1.044 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 482344960;
> +-------+----------------------------------------------------+
> |  ok   |                      summary                       |
> +-------+----------------------------------------------------+
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +-------+----------------------------------------------------+
> 1 row selected (0.372 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_node` = 1;
> +-------+--------------------------------------+
> |  ok   |               summary                |
> +-------+--------------------------------------+
> | true  | planner.width.max_per_node updated.  |
> +-------+--------------------------------------+
> 1 row selected (0.292 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_query` = 1;
> +-------+---------------------------------------+
> |  ok   |                summary                |
> +-------+---------------------------------------+
> | true  | planner.width.max_per_query updated.  |
> +-------+---------------------------------------+
> 1 row selected (0.25 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
> dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
> columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
>  
> columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[2222],columns[30],columns[2420],columns[1520],
>  columns[1410], 
> columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
>  
> columns[3333],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
>  columns[3210] ) d where d.col433 = 'sjka skjf';
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to 
> disk
> Fragment 2:0
> [Error Id: 26b55576-1a5c-4756-96d6-fbec25eecf03 on qa-node190.qa.lab:31010]
>   (java.lang.AssertionError) null
>     org.apache.drill.exec.vector.VarCharVector.allocateNew():400
>     org.apache.drill.exec.vector.RepeatedVarCharVector.allocateNew():272
>     
> org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():37
>     org.apache.drill.exec.vector.AllocationHelper.allocate():44
>     org.apache.drill.exec.record.SmartAllocationHelper.allocateVector():117
>     org.apache.drill.exec.record.SmartAllocationHelper.allocateVector():97
>     org.apache.drill.exec.record.SmartAllocationHelper.allocateBatch():87
>     
> org.apache.drill.exec.physical.impl.xsort.managed.PriorityQueueCopierWrapper$BatchMerger.next():262
>     
> org.apache.drill.exec.physical.impl.xsort.managed.SpilledRuns.doMergeAndSpill():171
>     
> org.apache.drill.exec.physical.impl.xsort.managed.SpilledRuns.safeMergeAndSpill():138
>     
> org.apache.drill.exec.physical.impl.xsort.managed.SpilledRuns.mergeAndSpill():91
>     
> org.apache.drill.exec.physical.impl.xsort.managed.SortImpl.spillFromMemory():298
>     org.apache.drill.exec.physical.impl.xsort.managed.SortImpl.addBatch():191
>     
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():402
>     
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():338
>     
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():290
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():225
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():225
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745 (state=,code=0)
> {code}
> The data set is too large to attach here. Logs and profiles are attached



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to