[
https://issues.apache.org/jira/browse/DRILL-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-5594.
--------------------------------
Resolution: Fixed
> Excessive buffer reallocations during merge phase of external sort
> ------------------------------------------------------------------
>
> Key: DRILL-5594
> URL: https://issues.apache.org/jira/browse/DRILL-5594
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.11.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Minor
> Fix For: 1.12.0
>
>
> Consider the log file attached to DRILL-5513. The log shows an excessive
> number of buffer reallocations while assembling the merged output of the sort:
> {code}
> 2017-05-15 12:58:46,319 [26e5f7b8-71e8-afca-e72e-fad7be2b2416:frag:5:13]
> DEBUG o.a.drill.exec.vector.BigIntVector - Reallocating vector
> [$data$(BIGINT:REQUIRED)]. # of bytes: [32768] -> [65536]
> 2017-05-15 12:58:46,321 [26e5f7b8-71e8-afca-e72e-fad7be2b2416:frag:5:13]
> DEBUG o.a.drill.exec.vector.UInt4Vector - Reallocating vector
> [$offsets$(UINT4:REQUIRED)]. # of bytes: [16384] -> [32768]
> ...:5:13] DEBUG o.a.drill.exec.vector.UInt4Vector - Reallocating vector
> [$offsets$(UINT4:REQUIRED)]. # of bytes: [16384] -> [32768]
> ...:5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector
> [$bits$(UINT1:REQUIRED)]. # of bytes: [4096] -> [8192]
> ...:5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector
> [$bits$(UINT1:REQUIRED)]. # of bytes: [4096] -> [8192]
> ...5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector
> [$bits$(UINT1:REQUIRED)]. # of bytes: [4096] -> [8192]
> ...5:13] DEBUG o.a.drill.exec.vector.BigIntVector - Reallocating vector
> [c(BIGINT:OPTIONAL)]. # of bytes: [32768] -> [65536]
> ..5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector
> [$bits$(UINT1:REQUIRED)]. # of bytes: [4096] -> [8192]
> ...frag:5:13] DEBUG o.a.drill.exec.vector.Float8Vector - Reallocating vector
> [d(FLOAT8:OPTIONAL)]. #
> ...
> {code}
> Hundreds of these lines appear. This means that the initial buffer allocation
> is too small.
> Given that the merge phase knows the number of rows it will put into the
> batch, and has access to the "sizer" information to estimate Varchar widths,
> the merge phase should predict, and allocate, the required buffer sizes to
> avoid repeated reallocation. Each reallocation requires:
> * Allocate new buffer (which puts memory pressure on the sort's memory)
> * Copy data from old to new buffer
> * Zero-fill the new half (zero-fill is not done on the first allocation,
> strangely.)
> * Free the unneeded original buffer.
> Since the sort is already slow, the above extra work just makes the problem
> worse.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)