[GitHub] [arrow-datafusion] Dandandan edited a comment on pull request #68: Experimenting with arrow2

GitBox Sat, 18 Sep 2021 07:40:34 -0700


Dandandan edited a comment on pull request #68:
URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-922307204



   Currently my intuition is that the parquet files generated by the TPC-H 
convert + `parquet` are somehow less optimal for `parquet2` and `arrow2` than 
they are for`parquet`.
   
   I tried to check this by also generating parquet files via this branch 
(which uses `parquet2`) but getting some errors when reading statistics (even 
when enabling statistics in the writer). The files generated are somewhat 
smaller (~10%) - so there is at least some difference.
   
   I added the parquet (lineitem) files over here:
   
   
https://drive.google.com/drive/folders/19kUhg_6o8PuEOwCCLGQq0zh1bAq2TzG0?usp=sharing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan edited a comment on pull request #68: Experimenting with arrow2

Reply via email to