[
https://issues.apache.org/jira/browse/DRILL-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208632#comment-16208632
]
Paul Rogers commented on DRILL-5778:
------------------------------------
The problem is, Drill is a concurrent system; memory level is not deterministic
the way it would be if only one query at a time ran. The fact that we are near
memory exhaustion at time t1 tells us nothing about what happens at t2. Another
query might have finished, so memory at t2 might be higher. Or, another query
might be aggressive, and memory at time t2 might be lower even if our query
used no memory.
Instead, we should use a tool to monitor Drill free memory (or, equivalently,
total memory used) to see the trends over the life of the query.
The core problem here is that the up-front calcs suggest we have a problem, but
the query does not actually have a problem. Is the message being too
conservative? Or, did the query have so little data that the potential problem
never became an actual problem?
> Drill seems to run out of memory but completes execution
> --------------------------------------------------------
>
> Key: DRILL-5778
> URL: https://issues.apache.org/jira/browse/DRILL-5778
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 1.11.0
> Reporter: Robert Hou
> Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0.sys.drill,
> drillbit.log
>
>
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
> select count(*) from (select * from (select id, flatten(str_list) str from
> dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by
> d.str) d1 where d1.id=0;
> {noformat}
> Plan is:
> {noformat}
> | 00-00 Screen
> 00-01 Project(EXPR$0=[$0])
> 00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
> 00-03 UnionExchange
> 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 01-02 Project($f0=[0])
> 01-03 SelectionVectorRemover
> 01-04 Filter(condition=[=($0, 0)])
> 01-05 SingleMergeExchange(sort0=[1 ASC])
> 02-01 SelectionVectorRemover
> 02-02 Sort(sort0=[$1], dir0=[ASC])
> 02-03 Project(id=[$0], str=[$1])
> 02-04 HashToRandomExchange(dist0=[[$1]])
> 03-01 UnorderedMuxExchange
> 04-01 Project(id=[$0], str=[$1],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)])
> 04-02 Flatten(flattenField=[$1])
> 04-03 Project(id=[$0], str=[$1])
> 04-04 Scan(groupscan=[EasyGroupScan
> [selectionRoot=maprfs:/drill/testdata/resource-manager/flatten-large-small.json,
> numFiles=1, columns=[`id`, `str_list`],
> files=[maprfs:///drill/testdata/resource-manager/flatten-large-small.json]]])
> {noformat}
> From drillbit.log:
> {noformat}
> 2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes {
> str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134,
> data size: 548360)
> id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data
> size: 36864)
> Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width:
> 262163, Net row width: 143, Density: 1}
> 2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] ERROR
> o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches.
> Incoming batch size: 1073819648, available memory: 2147483648
> 2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] INFO
> o.a.d.e.c.ClassCompilerSelector - Java compiler policy: DEFAULT, Debug
> option: true
> 2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.compile.JaninoClassCompiler - Compiling (source size=3.3 KiB):
> ...
> 2017-09-08 05:07:21,536 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.exec.compile.ClassTransformer - Compiled and merged
> SingleBatchSorterGen2677: bytecode size = 3.6 KiB, time = 19 ms.
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 5608 us to sort 4096 records
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 143
> bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048476 bytes,
> gross = 1572714 bytes, records = 7332; spill file = 268435456 bytes
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.p.i.x.m.ExternalSortBatch - Output batch size: net = 9371505 bytes,
> gross = 14057257 bytes, records = 65535
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.p.i.x.m.ExternalSortBatch - Available memory: 2147483648, buffer
> memory = 2143289744, merge memory = 2128740638
> 2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 4303 us to sort 4096 records
> 2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 266
> bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096
> 2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048572 bytes,
> gross = 1572858 bytes, records = 3942; spill file = 268435456 bytes
> 2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.p.i.x.m.ExternalSortBatch - Output batch size: net = 16777152 bytes,
> gross = 25165728 bytes, records = 63072
> 2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG
> o.a.d.e.p.i.x.m.ExternalSortBatch - Available memory: 2147483648, buffer
> memory = 2143289360, merge memory = 2113929344
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)