[GitHub] [arrow] alamb commented on a change in pull request #8705: ARROW-10464: [Rust] [DataFusion] Add utility to convert TPC-H data from tbl to CSV and Parquet

GitBox Wed, 18 Nov 2020 13:44:41 -0800


alamb commented on a change in pull request #8705:
URL: https://github.com/apache/arrow/pull/8705#discussion_r526441242




##########
File path: rust/benchmarks/README.md
##########
@@ -49,45 +49,16 @@ data. This value can be increased to generate larger data 
sets.
 The benchmark can then be run (assuming the data created from `dbgen` is in 
`/mnt/tpch-dbgen`) with a command such as:
 
 ```bash
-cargo run --release --bin tpch -- --iterations 3 --path /mnt/tpch-dbgen 
--format tbl --query 1 --batch-size 4096
+cargo run --release --bin tpch -- benchmark --iterations 3 --path 
/mnt/tpch-dbgen --format tbl --query 1 --batch-size 4096
 ```
 
-The benchmark program also supports CSV and Parquet input file formats.
-
-This crate does not currently provide a method for converting the generated 
tbl format to CSV or Parquet so it is 
-necessary to use other tools to perform this conversion.
-
-One option is to use the following Docker image to perform the conversion from 
`tbl` files to CSV or Parquet.
-
-```bash
-docker run -it ballistacompute/spark-benchmarks:0.4.0-SNAPSHOT 
-  -h, --help   Show help message
-
-Subcommand: convert-tpch
-  -i, --input  <arg>
-      --input-format  <arg>
-  -o, --output  <arg>
-      --output-format  <arg>
-  -p, --partitions  <arg>

Review comment:
       FWIW the Rust version doesn't seem to have any option to create 
partitions, which is fine for the first version. However, it might be worth it 
to leave these instructions in until we have added the `-p` option to the Rust 
creator. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] alamb commented on a change in pull request #8705: ARROW-10464: [Rust] [DataFusion] Add utility to convert TPC-H data from tbl to CSV and Parquet

Reply via email to