Paul Rogers created DRILL-5760:
----------------------------------

             Summary: Performance hit: SVR causes repeated vector reallocations
                 Key: DRILL-5760
                 URL: https://issues.apache.org/jira/browse/DRILL-5760
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Priority: Minor


Run the query in DRILL-5753 with DEBUG logging enabled. You will see a set of 
vector reallocations out of the JSON reader as described by DRILL-5759.

Later, the sort in the query will complete with an in-memory sort. Data will be 
sent downstream to a selection vector remover (SVR). The SVR will fire a very 
large number of additional vector reallocations:

{code}
VarCharVector - Reallocating VarChar, new size 65536
VarCharVector - Reallocating VarChar, new size 65536
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[131072] -> [262144]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[131072] -> [262144]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] 
-> [65536]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] 
-> [65536]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] 
-> [65536]
BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [262144] 
-> [524288]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] 
-> [65536]
Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [262144] 
-> [524288]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[65536] -> [131072]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[65536] -> [131072]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[65536] -> [131072]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[65536] -> [131072]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [16384] 
-> [32768]
VarCharVector - Reallocating VarChar, new size 65536
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [16384] 
-> [32768]
BigIntVector - Reallocating vector [col1(BIGINT:OPTIONAL)]. # of bytes: 
[131072] -> [262144]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [16384] 
-> [32768]
BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: 
[524288] -> [1048576]
VarCharVector - Reallocating VarChar, new size 65536
VarCharVector - Reallocating VarChar, new size 131072
VarCharVector - Reallocating VarChar, new size 131072
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[262144] -> [524288]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[262144] -> [524288]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536] 
-> [131072]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536] 
-> [131072]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536] 
-> [131072]
BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [524288] 
-> [1048576]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536] 
-> [131072]
Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [524288] 
-> [1048576]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[131072] -> [262144]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[131072] -> [262144]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[131072] -> [262144]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[131072] -> [262144]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] 
-> [65536]
VarCharVector - Reallocating VarChar, new size 131072
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] 
-> [65536]
BigIntVector - Reallocating vector [col1(BIGINT:OPTIONAL)]. # of bytes: 
[262144] -> [524288]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768] 
-> [65536]
BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: 
[1048576] -> [2097152]
VarCharVector - Reallocating VarChar, new size 262144
VarCharVector - Reallocating VarChar, new size 131072
VarCharVector - Reallocating VarChar, new size 262144
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[524288] -> [1048576]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[524288] -> [1048576]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072] 
-> [262144]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072] 
-> [262144]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072] 
-> [262144]
BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [1048576] 
-> [2097152]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072] 
-> [262144]
Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [1048576] 
-> [2097152]
{code}

The likely cause is that the input data has repeated elements and the SVR 
probably does not consider repetition when allocating vectors, resulting in 
multiple allocate-copy-reallocate cycles that thrash memory and waste time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to