mapleFU opened a new issue, #36548:
URL: https://github.com/apache/arrow/issues/36548
### Describe the enhancement requested
When I go through the code in `Statistics` and `PageIndex`, I found the
argument of builder contains a `::arrow::MemoryPool`, and most caller didn't
passing it. Like:
Builder:
```C++
/// \brief Typed version of Statistics::Make
template <typename DType>
std::shared_ptr<TypedStatistics<DType>> MakeStatistics(
const ColumnDescriptor* descr,
::arrow::MemoryPool* pool = ::arrow::default_memory_pool()) {
return
std::static_pointer_cast<TypedStatistics<DType>>(Statistics::Make(descr, pool));
}
```
Caller:
```c++
template <typename DType>
static std::shared_ptr<Statistics> MakeTypedColumnStats(
const format::ColumnMetaData& metadata, const ColumnDescriptor* descr) {
// If ColumnOrder is defined, return max_value and min_value
if (descr->column_order().get_order() == ColumnOrder::TYPE_DEFINED_ORDER) {
return MakeStatistics<DType>(
descr, metadata.statistics.min_value, metadata.statistics.max_value,
metadata.num_values - metadata.statistics.null_count,
metadata.statistics.null_count, metadata.statistics.distinct_count,
metadata.statistics.__isset.max_value ||
metadata.statistics.__isset.min_value,
metadata.statistics.__isset.null_count,
metadata.statistics.__isset.distinct_count);
}
// Default behavior
return MakeStatistics<DType>(
descr, metadata.statistics.min, metadata.statistics.max,
metadata.num_values - metadata.statistics.null_count,
metadata.statistics.null_count, metadata.statistics.distinct_count,
metadata.statistics.__isset.max || metadata.statistics.__isset.min,
metadata.statistics.__isset.null_count,
metadata.statistics.__isset.distinct_count);
}
```
However, I found that `::arrow::MemoryPool` is not useful here, it's used to
build the `PlainEncoder` and Encode the min-max, and only used with little time
and building small chunk of memory, so, do we need passing `MemoryPool` to
`Statistics`? Or just using default one is ok?
### Component(s)
C++, Parquet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]