[GitHub] [arrow-datafusion] Dandandan commented on pull request #68: Experimenting with arrow2

GitBox Mon, 13 Sep 2021 13:32:59 -0700


Dandandan commented on pull request #68:
URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-918553501



   FYI I was trying to run the TPC-H benchmarks in datafusion against a current.
   
   I had to add support for parquet (de)compression via datafusion:
   ```
   -arrow = { package = "arrow2", git = 
"https://github.com/jorgecarleitao/arrow2";, rev = 
"43d8cf5c54805aa437a1c7ee48f80e90f07bc553", features = ["io_csv", "io_json", 
"io_parquet", "io_ipc", "io_print", "ahash", "merge_sort", "compute", "regex"] }
   +arrow = { package = "arrow2", git = 
"https://github.com/jorgecarleitao/arrow2";, rev = 
"43d8cf5c54805aa437a1c7ee48f80e90f07bc553", features = ["io_csv", "io_json", 
"io_parquet", "io_ipc", "io_print", "ahash", "merge_sort", "compute", "regex", 
"io_parquet_compression"] }
   ```
   
   And some small changes in the TPCH tool to make it compile.
   
   The Parquet files generated by the benchmark were not readable yet though - 
this should be implemented here.
   https://github.com/jorgecarleitao/arrow2/pull/402
   
   Would be fun to have a full running benchmark against the same files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on pull request #68: Experimenting with arrow2

Reply via email to