[
https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Boaz Ben-Zvi updated DRILL-5728:
--------------------------------
Description:
When aggregating a non-nullable column (like *sum(l_partkey)* below), the code
generation creates an extra value vector (in addition to the actual "sum"
vector) which is used as a "nonNullCount".
This is useless (as the underlying column is non-nullable), and wastes
considerable memory ( 8 * 64K = 512K per each value in a batch !!)
Example query:
{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by
l_orderkry;}}
And as can be seen in the generated code below, the bigint value vector *vv5*
is only used to hold a *1* flag to note "not null":
bq. public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
bq. throws SchemaChangeException
bq. {
bq. {
bq. IntHolder out11 = new IntHolder();
bq. {
bq. out11 .value = vv8 .getAccessor().get((incomingRowIdx));
bq. }
bq. IntHolder in = out11;
bq. work0 .value = vv1 .getAccessor().get((htRowIdx));
bq. BigIntHolder value = work0;
bq. work4 .value = vv5 .getAccessor().get((htRowIdx));
bq. BigIntHolder nonNullCount = work4;
bq.
bq. SumFunctions$IntSum_add: {
bq. nonNullCount.value = 1;
bq. value.value += in.value;
bq. }
bq.
bq. work0 = value;
bq. vv1 .getMutator().set((htRowIdx), work0 .value);
bq. work4 = nonNullCount;
bq. vv5 .getMutator().set((htRowIdx), work4 .value);
bq. }
bq. }
was:
When aggregating a non-nullable column (like *sum(l_partkey)* below), the code
generation creates an extra value vector (in addition to the actual "sum"
vector) which is used as a "nonNullCount".
This is useless (as the underlying column is non-nullable), and wastes
considerable memory ( 8 * 64K = 512K per each value in a batch !!)
Example query:
{{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by
l_orderkry;}}
And as can be seen in the generated code below, the bigint value vector *vv5*
is only used to hold a *1* flag to note "not null":
{quote}public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
throws SchemaChangeException
{
{
IntHolder out11 = new IntHolder();
{
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
IntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
SumFunctions$IntSum_add: {
nonNullCount.value = 1;
value.value += in.value;
}
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}{quote}
> Hash Aggregate: Useless bigint value vector in the values batch
> ---------------------------------------------------------------
>
> Key: DRILL-5728
> URL: https://issues.apache.org/jira/browse/DRILL-5728
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Codegen
> Affects Versions: 1.11.0
> Reporter: Boaz Ben-Zvi
> Priority: Minor
>
> When aggregating a non-nullable column (like *sum(l_partkey)* below), the
> code generation creates an extra value vector (in addition to the actual
> "sum" vector) which is used as a "nonNullCount".
> This is useless (as the underlying column is non-nullable), and wastes
> considerable memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by
> l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5*
> is only used to hold a *1* flag to note "not null":
> bq. public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
> bq. throws SchemaChangeException
> bq. {
> bq. {
> bq. IntHolder out11 = new IntHolder();
> bq. {
> bq. out11 .value = vv8
> .getAccessor().get((incomingRowIdx));
> bq. }
> bq. IntHolder in = out11;
> bq. work0 .value = vv1 .getAccessor().get((htRowIdx));
> bq. BigIntHolder value = work0;
> bq. work4 .value = vv5 .getAccessor().get((htRowIdx));
> bq. BigIntHolder nonNullCount = work4;
> bq.
> bq. SumFunctions$IntSum_add: {
> bq. nonNullCount.value = 1;
> bq. value.value += in.value;
> bq. }
> bq.
> bq. work0 = value;
> bq. vv1 .getMutator().set((htRowIdx), work0 .value);
> bq. work4 = nonNullCount;
> bq. vv5 .getMutator().set((htRowIdx), work4 .value);
> bq. }
> bq. }
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)