[ 
https://issues.apache.org/jira/browse/DRILL-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-5594.
--------------------------------
    Resolution: Fixed

> Excessive buffer reallocations during merge phase of external sort
> ------------------------------------------------------------------
>
>                 Key: DRILL-5594
>                 URL: https://issues.apache.org/jira/browse/DRILL-5594
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.12.0
>
>
> Consider the log file attached to DRILL-5513. The log shows an excessive 
> number of buffer reallocations while assembling the merged output of the sort:
> {code}
> 2017-05-15 12:58:46,319 [26e5f7b8-71e8-afca-e72e-fad7be2b2416:frag:5:13] 
> DEBUG o.a.drill.exec.vector.BigIntVector - Reallocating vector 
> [$data$(BIGINT:REQUIRED)]. # of bytes: [32768] -> [65536]
> 2017-05-15 12:58:46,321 [26e5f7b8-71e8-afca-e72e-fad7be2b2416:frag:5:13] 
> DEBUG o.a.drill.exec.vector.UInt4Vector - Reallocating vector 
> [$offsets$(UINT4:REQUIRED)]. # of bytes: [16384] -> [32768]
> ...:5:13] DEBUG o.a.drill.exec.vector.UInt4Vector - Reallocating vector 
> [$offsets$(UINT4:REQUIRED)]. # of bytes: [16384] -> [32768]
> ...:5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector 
> [$bits$(UINT1:REQUIRED)]. # of bytes: [4096] -> [8192]
> ...:5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector 
> [$bits$(UINT1:REQUIRED)]. # of bytes: [4096] -> [8192]
> ...5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector 
> [$bits$(UINT1:REQUIRED)]. # of bytes: [4096] -> [8192]
> ...5:13] DEBUG o.a.drill.exec.vector.BigIntVector - Reallocating vector 
> [c(BIGINT:OPTIONAL)]. # of bytes: [32768] -> [65536]
> ..5:13] DEBUG o.a.drill.exec.vector.UInt1Vector - Reallocating vector 
> [$bits$(UINT1:REQUIRED)]. # of bytes: [4096] -> [8192]
> ...frag:5:13] DEBUG o.a.drill.exec.vector.Float8Vector - Reallocating vector 
> [d(FLOAT8:OPTIONAL)]. # 
> ...
> {code}
> Hundreds of these lines appear. This means that the initial buffer allocation 
> is too small.
> Given that the merge phase knows the number of rows it will put into the 
> batch, and has access to the "sizer" information to estimate Varchar widths, 
> the merge phase should predict, and allocate, the required buffer sizes to 
> avoid repeated reallocation. Each reallocation requires:
> * Allocate new buffer (which puts memory pressure on the sort's memory)
> * Copy data from old to new buffer
> * Zero-fill the new half (zero-fill is not done on the first allocation, 
> strangely.)
> * Free the unneeded original buffer.
> Since the sort is already slow, the above extra work just makes the problem 
> worse.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to