[
https://issues.apache.org/jira/browse/DRILL-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879685#comment-15879685
]
Paul Rogers commented on DRILL-5294:
------------------------------------
Structure of the input batches:
{code}
Actual batch schema & sizes {
col433(std col. size: 54, actual col. size: 9, total size: 13303, data size:
9207, row capacity: 1023, density: 70)
EXPR$1(std col. size: 54, actual col. size: 14, total size: 18418, data size:
14322, row capacity: 1023, density: 78)
EXPR$2(std col. size: 54, actual col. size: 14, total size: 18418, data size:
14322, row capacity: 1023, density: 78)
EXPR$3(std col. size: 54, actual col. size: 3, total size: 7165, data size:
3069, row capacity: 1023, density: 43)
EXPR$4(std col. size: 54, actual col. size: 6, total size: 10234, data size:
6138, row capacity: 1023, density: 60)
EXPR$5(std col. size: 54, actual col. size: 10, total size: 14326, data size:
10230, row capacity: 1023, density: 72)
EXPR$6(std col. size: 54, actual col. size: 20, total size: 24556, data size:
20460, row capacity: 1023, density: 84)
EXPR$7(std col. size: 54, actual col. size: 11, total size: 15349, data size:
11253, row capacity: 1023, density: 74)
EXPR$8(std col. size: 54, actual col. size: 13, total size: 17395, data size:
13299, row capacity: 1023, density: 77)
EXPR$9(std col. size: 54, actual col. size: 5, total size: 9211, data size:
5115, row capacity: 1023, density: 56)
EXPR$10(std col. size: 54, actual col. size: 10, total size: 14326, data
size: 10230, row capacity: 1023, density: 72)
EXPR$11(std col. size: 54, actual col. size: 13, total size: 17395, data
size: 13299, row capacity: 1023, density: 77)
EXPR$12(std col. size: 54, actual col. size: 14, total size: 18418, data
size: 14322, row capacity: 1023, density: 78)
EXPR$13(std col. size: 54, actual col. size: 6, total size: 10234, data size:
6138, row capacity: 1023, density: 60)
EXPR$14(std col. size: 54, actual col. size: 9, total size: 13303, data size:
9207, row capacity: 1023, density: 70)
EXPR$15(std col. size: 54, actual col. size: 10, total size: 14326, data
size: 10230, row capacity: 1023, density: 72)
EXPR$16(std col. size: 54, actual col. size: 9, total size: 13303, data size:
9207, row capacity: 1023, density: 70)
EXPR$17(std col. size: 54, actual col. size: 7, total size: 11257, data size:
7161, row capacity: 1023, density: 64)
EXPR$18(std col. size: 54, actual col. size: 20, total size: 24556, data
size: 20460, row capacity: 1023, density: 84)
EXPR$19(std col. size: 54, actual col. size: 14, total size: 18418, data
size: 14322, row capacity: 1023, density: 78)
EXPR$20(std col. size: 54, actual col. size: 7, total size: 11257, data size:
7161, row capacity: 1023, density: 64)
EXPR$21(std col. size: 54, actual col. size: 4, total size: 8188, data size:
4092, row capacity: 1023, density: 50)
EXPR$22(std col. size: 54, actual col. size: 9, total size: 13303, data size:
9207, row capacity: 1023, density: 70)
EXPR$23(std col. size: 54, actual col. size: 7, total size: 11257, data size:
7161, row capacity: 1023, density: 64)
EXPR$24(std col. size: 54, actual col. size: 11, total size: 15349, data
size: 11253, row capacity: 1023, density: 74)
Records: 1023, Total size: 365313, Row width:359, Density:70}
{code}
All columns are of reasonable size, vectors are small, batches are just 360 K
in size. However, when taking ownership of the batches, memory goes up by 500K
as shown earlier. This needs an explanation.
> Managed External Sort throws an OOM during the merge and spill phase
> --------------------------------------------------------------------
>
> Key: DRILL-5294
> URL: https://issues.apache.org/jira/browse/DRILL-5294
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Reporter: Rahul Challapalli
> Assignee: Paul Rogers
> Fix For: 1.10.0
>
> Attachments: 2751ce6d-67e6-ae08-3b68-e33b29f9d2a3.sys.drill,
> drillbit.log
>
>
> commit # : 38f816a45924654efd085bf7f1da7d97a4a51e38
> The below query fails with managed sort while it succeeds on the old sort
> {code}
> select * from (select columns[433] col433, columns[0],
> columns[1],columns[2],columns[3],columns[4],columns[5],columns[6],columns[7],columns[8],columns[9],columns[10],columns[11]
> from dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by
> columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50])
> d where d.col433 = 'sjka skjf';
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to
> disk
> Fragment 1:11
> [Error Id: 0aa20284-cfcc-450f-89b3-645c280f33a4 on qa-node190.qa.lab:31010]
> (state=,code=0)
> {code}
> Env :
> {code}
> No of Drillbits : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> {code}
> Attached the logs and profile. Data is too large for a jira
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)