mapleFU opened a new issue, #36548:
URL: https://github.com/apache/arrow/issues/36548

   ### Describe the enhancement requested
   
   When I go through the code in `Statistics` and `PageIndex`, I found the 
argument of builder contains a `::arrow::MemoryPool`, and most caller didn't 
passing it. Like:
   
   Builder: 
   
   ```C++
   /// \brief Typed version of Statistics::Make
   template <typename DType>
   std::shared_ptr<TypedStatistics<DType>> MakeStatistics(
       const ColumnDescriptor* descr,
       ::arrow::MemoryPool* pool = ::arrow::default_memory_pool()) {
     return 
std::static_pointer_cast<TypedStatistics<DType>>(Statistics::Make(descr, pool));
   }
   ```
   
   Caller:
   
   ```c++
   template <typename DType>
   static std::shared_ptr<Statistics> MakeTypedColumnStats(
       const format::ColumnMetaData& metadata, const ColumnDescriptor* descr) {
     // If ColumnOrder is defined, return max_value and min_value
     if (descr->column_order().get_order() == ColumnOrder::TYPE_DEFINED_ORDER) {
       return MakeStatistics<DType>(
           descr, metadata.statistics.min_value, metadata.statistics.max_value,
           metadata.num_values - metadata.statistics.null_count,
           metadata.statistics.null_count, metadata.statistics.distinct_count,
           metadata.statistics.__isset.max_value || 
metadata.statistics.__isset.min_value,
           metadata.statistics.__isset.null_count,
           metadata.statistics.__isset.distinct_count);
     }
     // Default behavior
     return MakeStatistics<DType>(
         descr, metadata.statistics.min, metadata.statistics.max,
         metadata.num_values - metadata.statistics.null_count,
         metadata.statistics.null_count, metadata.statistics.distinct_count,
         metadata.statistics.__isset.max || metadata.statistics.__isset.min,
         metadata.statistics.__isset.null_count, 
metadata.statistics.__isset.distinct_count);
   }
   ```
   
   However, I found that `::arrow::MemoryPool` is not useful here, it's used to 
build the `PlainEncoder` and Encode the min-max, and only used with little time 
and building small chunk of memory, so, do we need passing `MemoryPool` to 
`Statistics`? Or just using default one is ok?
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to