Dandandan edited a comment on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-922307204
Currently my intuition is that the parquet files generated by the TPC-H convert + `parquet` are somehow less optimal for `parquet2` and `arrow2` than they are for`parquet`. I tried to check this by also generating parquet files via this branch (which uses `parquet2`) but getting some errors when reading statistics (even when enabling statistics in the writer). The files generated are somewhat smaller (~10%) - so there is at least some difference. I added the parquet (lineitem) files over here: https://drive.google.com/drive/folders/19kUhg_6o8PuEOwCCLGQq0zh1bAq2TzG0?usp=sharing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
