Paul Rogers created DRILL-5760:
----------------------------------
Summary: Performance hit: SVR causes repeated vector reallocations
Key: DRILL-5760
URL: https://issues.apache.org/jira/browse/DRILL-5760
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.10.0
Reporter: Paul Rogers
Priority: Minor
Run the query in DRILL-5753 with DEBUG logging enabled. You will see a set of
vector reallocations out of the JSON reader as described by DRILL-5759.
Later, the sort in the query will complete with an in-memory sort. Data will be
sent downstream to a selection vector remover (SVR). The SVR will fire a very
large number of additional vector reallocations:
{code}
VarCharVector - Reallocating VarChar, new size 65536
VarCharVector - Reallocating VarChar, new size 65536
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[131072] -> [262144]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[131072] -> [262144]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768]
-> [65536]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768]
-> [65536]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768]
-> [65536]
BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [262144]
-> [524288]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768]
-> [65536]
Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [262144]
-> [524288]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[65536] -> [131072]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[65536] -> [131072]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[65536] -> [131072]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[65536] -> [131072]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [16384]
-> [32768]
VarCharVector - Reallocating VarChar, new size 65536
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [16384]
-> [32768]
BigIntVector - Reallocating vector [col1(BIGINT:OPTIONAL)]. # of bytes:
[131072] -> [262144]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [16384]
-> [32768]
BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes:
[524288] -> [1048576]
VarCharVector - Reallocating VarChar, new size 65536
VarCharVector - Reallocating VarChar, new size 131072
VarCharVector - Reallocating VarChar, new size 131072
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[262144] -> [524288]
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[262144] -> [524288]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536]
-> [131072]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536]
-> [131072]
UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536]
-> [131072]
BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [524288]
-> [1048576]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [65536]
-> [131072]
Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [524288]
-> [1048576]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[131072] -> [262144]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[131072] -> [262144]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[131072] -> [262144]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[131072] -> [262144]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768]
-> [65536]
VarCharVector - Reallocating VarChar, new size 131072
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768]
-> [65536]
BigIntVector - Reallocating vector [col1(BIGINT:OPTIONAL)]. # of bytes:
[262144] -> [524288]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [32768]
-> [65536]
BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes:
[1048576] -> [2097152]
VarCharVector - Reallocating VarChar, new size 262144
VarCharVector - Reallocating VarChar, new size 131072
VarCharVector - Reallocating VarChar, new size 262144
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[524288] -> [1048576]
Int4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[524288] -> [1048576]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072]
-> [262144]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072]
-> [262144]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072]
-> [262144]
BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [1048576]
-> [2097152]
Int1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [131072]
-> [262144]
Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [1048576]
-> [2097152]
{code}
The likely cause is that the input data has repeated elements and the SVR
probably does not consider repetition when allocating vectors, resulting in
multiple allocate-copy-reallocate cycles that thrash memory and waste time.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)