Hello, Let me share our trial to support the min/max statistics per record batch. https://github.com/heterodb/pg-strom/wiki/806:-Apache-Arrow-Min-Max-Statistics-Hint
The latest pg2arrow supports --stat option that can specify the columns to include min/max statistics for each record batch. The statistics are embedded in the custom_metadata[] of the Field (in the Footer area), if any. The example below shows database table dump with statistics on the lo_orderdate column. $ pg2arrow -d postgres -o /dev/shm/flineorder_sort.arrow -t lineorder_sort --stat=lo_orderdate --progress RecordBatch[0]: offset=1640 length=268437080 (meta=920, body=268436160) nitems=1303085 RecordBatch[1]: offset=268438720 length=268437080 (meta=920, body=268436160) nitems=1303085 : RecordBatch[9]: offset=2415935360 length=55668888 (meta=920, body=55667968) nitems=270231 Then, you can find out the custom-metadata on the lo_orderdate field; min_values and max_values. These are comma separated integer lists with 10 elements as many as the number of record batches. So, the first item of min_values and max_values are min-/max-datum of the record-batch[0]. $ python3 Python 3.6.8 (default, Aug 24 2020, 17:57:11) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pyarrow as pa >>> X = pa.RecordBatchFileReader('/dev/shm/flineorder_sort.arrow') >>> X.schema lo_orderkey: decimal(30, 8) lo_linenumber: int32 lo_custkey: decimal(30, 8) lo_partkey: int32 lo_suppkey: decimal(30, 8) lo_orderdate: int32 -- field metadata -- min_values: '19920101,19920919,19930608,19940223,19941111,19950730,1996' + 31 max_values: '19920919,19930608,19940223,19941111,19950730,19960417,1997' + 31 lo_orderpriority: fixed_size_binary[15] lo_shippriority: fixed_size_binary[1] lo_quantity: decimal(30, 8) lo_extendedprice: decimal(30, 8) lo_ordertotalprice: decimal(30, 8) : When we scan the arrow_fdw foreign table that maps that Apache Arrow file with the above min/max statistics, it automatically checks WHERE-clause and skipps record-batches that contain no rows to be survived. postgres=# EXPLAIN ANALYZE SELECT count(*) FROM flineorder_sort WHERE lo_orderpriority = '2-HIGH' AND lo_orderdate BETWEEN 19940101 AND 19940630; QUERY PLAN ------------------------------------------------------------------------------------------ Aggregate (cost=33143.09..33143.10 rows=1 width=8) (actual time=115.591..115.593 rows=1loops=1) -> Custom Scan (GpuPreAgg) on flineorder_sort (cost=33139.52..33142.07 rows=204 width=8) (actual time=115.580..115.585 rows=1 loops=1) Reduction: NoGroup Outer Scan: flineorder_sort (cost=4000.00..33139.42 rows=300 width=0) (actual time=10.682..21.555 rows=2606170 loops=1) Outer Scan Filter: ((lo_orderdate >= 19940101) AND (lo_orderdate <= 19940630) AND (lo_orderpriority = '2-HIGH'::bpchar)) Rows Removed by Outer Scan Filter: 2425885 referenced: lo_orderdate, lo_orderpriority Stats-Hint: (lo_orderdate >= 19940101), (lo_orderdate <= 19940630) [loaded: 2, skipped: 8] files0: /dev/shm/flineorder_sort.arrow (read: 217.52MB, size: 2357.11MB) Planning Time: 0.210 ms Execution Time: 153.508 ms (11 rows) This EXPLAIN ANALYZE displays the Stats-Hint line. It says Arrow_Fdw could use (lo_orderdate >= 19940101) and (lo_orderdate <= 19940630) to check the min/max statistics, then actually skipped 8 record-batches but only 2 record-batches were loaded. Our expectation is IoT/M2M grade time-series log-processing because they always contain timestamp values for each entry, and physically closed rows tend to have similar values. Not only Apache Arrow files generated by pg2arrow, this min/max statistics values are appendable by rewrite of the Footer portion, without relocation of record-batches. So, we plan to provide a standalone command to attach the min/max statistics onto the existing Apache Arrow generated by other tools. Best regards, 2021年2月18日(木) 13:33 Kohei KaiGai <kai...@heterodb.com>: > > Thanks for the clarification. > > > There is key-value metadata available on Message which might be able to > > work in the short term (some sort of encoded message). I think > > standardizing how we store statistics per batch does make sense. > > > For example, JSON array of min/max values as a key-value metadata > in the Footer->Schema->Fields[]->custom_metadata? > Even though the metadata field must be less than INT_MAX, I think it > is enough portable and not invasive way. > > > We unfortunately can't add anything to field-node without breaking > > compatibility. But another option would be to add a new structure as a > > parallel list on RecordBatch itself. > > > > If we do add a new structure or arbitrary key-value pair we should not use > > KeyValue but should have something where the values can be bytes. > > > What is the parallel-list means? > If we would have a standardized binary structure, like DictionaryBatch, > to store the statistics including min/max values, it exactly makes sense > more than text-encoded key-value metadata, of course. > > Best regards, > > 2021年2月18日(木) 12:37 Micah Kornfield <emkornfi...@gmail.com>: > > > > There is key-value metadata available on Message which might be able to > > work in the short term (some sort of encoded message). I think > > standardizing how we store statistics per batch does make sense. > > > > We unfortunately can't add anything to field-node without breaking > > compatibility. But another option would be to add a new structure as a > > parallel list on RecordBatch itself. > > > > If we do add a new structure or arbitrary key-value pair we should not use > > KeyValue but should have something where the values can be bytes. > > > > On Wed, Feb 17, 2021 at 7:17 PM Kohei KaiGai <kai...@heterodb.com> wrote: > > > > > Hello, > > > > > > Does Apache Arrow have any standard way to embed min/max values of the > > > fields > > > per record-batch basis? > > > It looks FieldNode supports neither dedicated min/max attribute nor > > > custom-metadata. > > > https://github.com/apache/arrow/blob/master/format/Message.fbs#L28 > > > > > > If we embed an array of min/max values into the custom-metadata of the > > > Field-node, > > > we may be able to implement. > > > https://github.com/apache/arrow/blob/master/format/Schema.fbs#L344 > > > > > > What I like to implement is something like BRIN index at PostgreSQL. > > > http://heterodb.github.io/pg-strom/brin/ > > > > > > This index contains only min/max values for a particular block ranges, and > > > query > > > executor can skip blocks that obviously don't contain the target data. > > > If we can skip 9990 of 10000 record batch by checking metadata on a query > > > that > > > tries to fetch items in very narrow timestamps, it is a great > > > acceleration more than > > > full file scans. > > > > > > Best regards, > > > -- > > > HeteroDB, Inc / The PG-Strom Project > > > KaiGai Kohei <kai...@heterodb.com> > > > > > > > -- > HeteroDB, Inc / The PG-Strom Project > KaiGai Kohei <kai...@heterodb.com> -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei <kai...@heterodb.com>