Hi,

OK. I'll propose arrow::ArrayStatistics API that can be used
as a starting point.


Thanks,
-- 
kou

In <cak7z5t8qsk0qnnwez4tbpj9x1p2oqy-t6karh-jh4zbjhe9...@mail.gmail.com>
  "Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?" on Wed, 5 Jun 
2024 22:55:25 -0700,
  Micah Kornfield <emkornfi...@gmail.com> wrote:

> Generally I think this is a good idea that has been proposed before but I
> don't think we could ever make progress on design.
> 
> On Sun, Jun 2, 2024 at 7:17 PM Sutou Kouhei <k...@clear-code.com> wrote:
> 
>> Hi,
>>
>> Related GitHub issue:
>> https://github.com/apache/arrow/issues/41909
>>
>> How about adding arrow::ArrayStatistics?
>>
>> Motivation:
>>
>> An Apache Arrow format data doesn't have statistics. (We can
>> add statistics as metadata but there isn't any standard way
>> for it.)
>>
>> But a source of an Apache Arrow format data such as Apache
>> Parquet format data may have statistics. We can get the
>> source statistics via source reader such as
>> parquet::ColumnChunkMetaData::statistics() but can't get
>> them from read Apache Arrow format data. If we want to use
>> the source statistics, we need to keep the source reader.
>>
>> Proposal:
>>
>> How about adding arrow::ArrayStatistics or something and
>> attaching source statistics to read arrow::Array? If source
>> statistics are attached to read arrow::Array, we don't need
>> to keep a source reader to get source statistics.
>>
>> What do you think about this idea?
>>
>>
>> NOTE: I haven't thought about the arrow::ArrayStatistics
>> details yet. We'll be able to use parquet::Statistics and
>> its family as a reference.
>> https://github.com/apache/arrow/blob/main/cpp/src/parquet/statistics.h
>>
>>
>> Thanks,
>> --
>> kou
>>

Reply via email to