mingmwang commented on issue #4973:
URL:
https://github.com/apache/arrow-datafusion/issues/4973#issuecomment-1610843919
Some finding when reading DuckDB source code:
1) DuckDB uses generics to manage the Accumulator state, it uses function
pointers to do the dispatch, there is some C++ static cast.
2) There is no magic to the HashMap, the PartitionableHashTable is not used
in the single thread model. The two level hash tables' idea is the same as
ClickHouse, when the total groups in the single hash table exceeds a
threshold(10000), it turns to the partitioned version.
```c++
// two level hash tables
vector<unique_ptr<PartitionableHashTable>> intermediate_hts;
vector<shared_ptr<GroupedAggregateHashTable>> finalized_hts;
// radix_limit is hard coded to 10000
if (llstate.total_groups >= radix_limit) {
gstate.partitioned = true;
}
```
3) DuckDB uses `Row Layout`, the Row Layout keeps both the group keys and
agg stats together.
```c++
class RowLayout {
.......
private:
//! The types of the data columns
vector<LogicalType> types;
//! The aggregate functions
Aggregates aggregates;
//! The width of the validity header
idx_t flag_width;
//! The width of the data portion
idx_t data_width;
//! The width of the aggregate state portion
idx_t aggr_width;
//! The width of the entire row
idx_t row_width;
//! The offsets to the columns and aggregate data in each row
vector<idx_t> offsets;
//! Whether all columns in this layout are constant size
bool all_constant;
//! Offset to the pointer to the heap for each row
idx_t heap_pointer_offset;
};
}
```
4) Due to the RowLayout and the uses of generics, the memory size of the
group state in DuckDB is much smaller than DataFusion's, maybe more than `5
times `smaller than ours. I guess this is the major reason why they are faster
and have a more efficient memory access pattern.
DuckDB test result for tpch-q17
```c++
std::cout << "layout column count:" << layout.ColumnCount() <<
std::endl;
for (auto it = layout.GetTypes().cbegin(); it !=
layout.GetTypes().cend() ; ++it) std::cout << (*it).ToString() << " ";
std::cout << "layout agg count:" << layout.AggregateCount() <<
std::endl;
std::cout << "layout DataWidth:" << layout.GetDataWidth() << std::endl;
std::cout << "layout RowWidth:" << layout.GetRowWidth() << std::endl;
```
```
layout column count:2
INTEGER UBIGINT
layout agg count:1
layout DataWidth:15
layout RowWidth:40
```
DataFusion test result
```rust
let acc = AvgAccumulator::try_new(
&DataType::Decimal128(18, 0),
&DataType::Decimal128(24, 0),
)?;
println!("AvgAccumulator size: {:?}",
std::mem::size_of::<AvgAccumulator>());
struct NormalAVGStruct {
count: u64,
sum: i128,
}
println!("NormalAVGStruct size: {:?}",
std::mem::size_of::<NormalAVGStruct>());
println!("GroupState size: {:?}", std::mem::size_of::<GroupState>());
println!("Arrow OwnedRow size: {:?}", std::mem::size_of::<OwnedRow>());
println!("Accumulator item vec size: {:?}",
std::mem::size_of::<Vec<AccumulatorItem>>());
println!("usize vec size: {:?}", std::mem::size_of::<Vec<usize>>());
println!("boxed u8 size: {:?}", std::mem::size_of::<Box<[u8]>>());
```
```
AvgAccumulator size: 112
NormalAVGStruct size: 32
GroupState size: 112
Arrow OwnedRow size: 40
Accumulator item vec size: 24
usize vec size: 24
boxed u8 size: 16
```
Learn from me is that I think we should avoid using `Arc`, `Box`, `Vec`,
`dyn Trait` in the Group State, those are all pointers, pointers will consume
at least 8 bytes and pointers point to other memory address will lead to random
access.
We should also avoid using `ScalarValue` in Accumulator , `ScalarValue` is
the enum type and the memory size is padding to the most large member. That
means we have to make the Accumulator generic.
https://duckdb.org/2022/03/07/aggregate-hashtable.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]