Dandandan edited a comment on pull request #68:
URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-922307204


   Currently my intuition is that the parquet files generated by the TPC-H 
convert + `parquet` are somehow less optimal for `parquet2` and `arrow2` than 
they are for`parquet`.
   
   I tried to check this by also generating parquet files via this branch 
(which uses `parquet2`) but getting some errors when reading statistics (even 
when enabling statistics in the writer). The files generated are somewhat 
smaller (~10%) - so there is at least some difference.
   
   I added the parquet (lineitem) files over here:
   
   
https://drive.google.com/drive/folders/19kUhg_6o8PuEOwCCLGQq0zh1bAq2TzG0?usp=sharing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to