Hi, OK. I'll propose arrow::ArrayStatistics API that can be used as a starting point.
Thanks, -- kou In <cak7z5t8qsk0qnnwez4tbpj9x1p2oqy-t6karh-jh4zbjhe9...@mail.gmail.com> "Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?" on Wed, 5 Jun 2024 22:55:25 -0700, Micah Kornfield <emkornfi...@gmail.com> wrote: > Generally I think this is a good idea that has been proposed before but I > don't think we could ever make progress on design. > > On Sun, Jun 2, 2024 at 7:17 PM Sutou Kouhei <k...@clear-code.com> wrote: > >> Hi, >> >> Related GitHub issue: >> https://github.com/apache/arrow/issues/41909 >> >> How about adding arrow::ArrayStatistics? >> >> Motivation: >> >> An Apache Arrow format data doesn't have statistics. (We can >> add statistics as metadata but there isn't any standard way >> for it.) >> >> But a source of an Apache Arrow format data such as Apache >> Parquet format data may have statistics. We can get the >> source statistics via source reader such as >> parquet::ColumnChunkMetaData::statistics() but can't get >> them from read Apache Arrow format data. If we want to use >> the source statistics, we need to keep the source reader. >> >> Proposal: >> >> How about adding arrow::ArrayStatistics or something and >> attaching source statistics to read arrow::Array? If source >> statistics are attached to read arrow::Array, we don't need >> to keep a source reader to get source statistics. >> >> What do you think about this idea? >> >> >> NOTE: I haven't thought about the arrow::ArrayStatistics >> details yet. We'll be able to use parquet::Statistics and >> its family as a reference. >> https://github.com/apache/arrow/blob/main/cpp/src/parquet/statistics.h >> >> >> Thanks, >> -- >> kou >>