Ted-Jiang commented on code in PR #4131:
URL: https://github.com/apache/arrow-datafusion/pull/4131#discussion_r1016554574
##########
benchmarks/src/bin/parquet_filter_pushdown.rs:
##########
@@ -73,7 +74,19 @@ async fn main() -> Result<()> {
let path = opt.path.join("logs.parquet");
- let test_file = gen_data(path, opt.scale_factor, opt.page_size,
opt.row_group_size)?;
+ let mut props_builder = WriterProperties::builder();
+
+ if let Some(s) = opt.page_size {
+ props_builder = props_builder
+ .set_data_pagesize_limit(s)
+ .set_write_batch_size(s);
Review Comment:
Test shows:
First i set in `single_file_small_data_pages`
```
let props = WriterProperties::builder()
.set_data_page_row_count_limit(100)
.build();
```
then i run `parquet-tools column-index -c service
/Users/yangjiang/data_8311.parquet`
```
offset index for column service:
offset compressed size first row index
page-0 29 48 0
page-1 77 48 1024
page-2 125 48 2048
page-3 173 48 3072
page-4 221 48 4096
page-5 269 48 5120
page-6 317 48 6144
page-7 365 48 7168
page-8 413 48 8192
page-9 461 48 9216
page-10 509 48 10240
page-11 557 48 11264
page-12 605 48 12288
page-13 653 48 13312
page-14 701 48 14336
page-15 749 48 15360
page-16 797 48 16384
page-17 845 48 17408
page-18 893 48 18432
page-19 941 48
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]