[ 
https://issues.apache.org/jira/browse/ARROW-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411555#comment-17411555
 ] 

Wes McKinney edited comment on ARROW-3998 at 9/7/21, 9:52 PM:
--------------------------------------------------------------

DuckDB provides TPC-H dataset generation as an extension and can generate the 
datasets at different scale factors (see 
https://github.com/duckdb/duckdb/tree/6c7c9805fdf1604039ebed47d233ea55cabb4b2c/extension/tpch).
 Given that DuckDB can return result sets as Arrow format in Python and R, we 
could use it as a utility to generate testing files


was (Author: wesmckinn):
DuckDB provides TPC-H dataset generation as an extension and can generate the 
datasets at different scale factors. Given that DuckDB can return result sets 
as Arrow format in Python and R, we could use it as a utility to generate 
testing files

> Support TPC-H dbgen in Arrow
> ----------------------------
>
>                 Key: ARROW-3998
>                 URL: https://issues.apache.org/jira/browse/ARROW-3998
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: Benchmarking, Integration
>            Reporter: Francois Saint-Jacques
>            Priority: Minor
>
> Integration tests and benchmarks should read TPC-H data. This is going to be 
> useful for future query execution engine benchmarking.
> It could also attract researchers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to