lidavidm commented on pull request #11189: URL: https://github.com/apache/arrow/pull/11189#issuecomment-923201576
Though for what it's worth, arrowbench gives rather similar results locally using `as.data.frame(run_benchmark(dataset_taxi_parquet, n_iter=5, cpu_count=c(4)))`. Before: ``` Total run time: 9.956319 secs iteration process real start_mem_bytes end_mem_bytes 1 1 2.979338342 0.785497665 2493300736 3698921472 2 2 2.886093707 0.766516447 3699429376 3714109440 3 3 2.920944454 0.776514053 3714109440 3714109440 4 4 2.877046089 0.766922712 3714109440 3714109440 5 5 2.896827784 0.770490646 3714109440 3714109440 6 1 18.316171634 4.632888794 2493300736 3777204224 7 2 17.657202597 4.614435434 3731472384 3815968768 8 3 17.561911295 4.556154490 3744194560 3779891200 9 4 17.775040099 4.648452759 3744194560 3815968768 10 5 17.586604237 4.563611269 3744194560 3779891200 11 1 0.964031791 0.582021236 2493296640 3450097664 12 2 0.771914299 0.414866209 3342176256 3463372800 13 3 0.770634774 0.410928965 3347357696 3527225344 14 4 0.798272907 0.439264059 3352600576 3523477504 15 5 0.795549228 0.436603546 3357843456 3523641344 16 1 0.488568257 0.068626642 2493296640 3346874368 17 2 0.001445519 0.001448631 3346874368 3346874368 18 3 0.001467639 0.001470804 3346874368 3346874368 19 4 0.001425503 0.001428366 3346874368 3346874368 20 5 0.001460951 0.001464367 3346874368 3346874368 max_mem_bytes gc_level0 gc_level1 gc_level2 query cpu_count 1 3720196096 1 0 0 vignette 4 2 3720196096 0 0 0 vignette 4 3 3720196096 0 0 0 vignette 4 4 3720196096 0 0 0 vignette 4 5 3720196096 0 0 0 vignette 4 6 3796340736 2 0 1 payment_type_3 4 7 3815968768 1 0 1 payment_type_3 4 8 3815968768 1 0 0 payment_type_3 4 9 3815968768 0 0 1 payment_type_3 4 10 3815968768 1 0 0 payment_type_3 4 11 3481989120 2 1 3 small_no_files 4 12 3513794560 2 0 1 small_no_files 4 13 3527225344 1 0 1 small_no_files 4 14 3527225344 2 0 1 small_no_files 4 15 3527225344 0 1 1 small_no_files 4 16 3346874368 0 0 0 count_rows 4 17 3346874368 0 0 0 count_rows 4 18 3346874368 0 0 0 count_rows 4 19 3346874368 0 0 0 count_rows 4 20 3346874368 0 0 0 count_rows 4 ``` After: ``` Total run time: 9.995613 secs iteration process real start_mem_bytes end_mem_bytes 1 1 3.097148086 0.810223818 2493313024 3759751168 2 2 2.987976486 0.778924227 3760332800 3779207168 3 3 2.969162397 0.776125669 3779207168 3779207168 4 4 2.969802084 0.777535439 3779207168 3779207168 5 5 2.967313487 0.775563955 3779207168 3779207168 6 1 18.042212503 4.552006245 2493313024 3805003776 7 2 17.391559404 4.544502735 3759271936 3832758272 8 3 17.281636477 4.486674547 3760984064 3796680704 9 4 17.363466488 4.545516968 3760984064 3832758272 10 5 17.310964707 4.487994194 3760984064 3796680704 11 1 1.036147931 0.602855206 2493313024 3547631616 12 2 0.797990419 0.419062376 3439710208 3601801216 13 3 0.793205374 0.414812803 3485786112 3683479552 14 4 0.801588050 0.428945065 3508854784 3674488832 15 5 0.834174869 0.443217278 3508854784 3674656768 16 1 0.500276311 0.069972277 2493321216 3346898944 17 2 0.001478722 0.001481056 3346898944 3346898944 18 3 0.001478401 0.001481295 3346898944 3346898944 19 4 0.001472831 0.001475811 3346898944 3346898944 20 5 0.001461316 0.001464128 3346898944 3346898944 max_mem_bytes gc_level0 gc_level1 gc_level2 query cpu_count 1 3759751168 1 0 0 vignette 4 2 3779207168 0 0 0 vignette 4 3 3779207168 0 0 0 vignette 4 4 3779207168 0 0 0 vignette 4 5 3779207168 0 0 0 vignette 4 6 3824140288 2 0 1 payment_type_3 4 7 3832758272 1 0 1 payment_type_3 4 8 3832758272 1 0 0 payment_type_3 4 9 3832758272 0 0 1 payment_type_3 4 10 3832758272 1 0 0 payment_type_3 4 11 3579523072 2 1 3 small_no_files 4 12 3652222976 2 0 1 small_no_files 4 13 3683479552 1 0 1 small_no_files 4 14 3683479552 2 0 1 small_no_files 4 15 3683479552 0 1 1 small_no_files 4 16 3414007808 0 0 0 count_rows 4 17 3414007808 0 0 0 count_rows 4 18 3414007808 0 0 0 count_rows 4 19 3414007808 0 0 0 count_rows 4 20 3414007808 0 0 0 count_rows 4 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org