[
https://issues.apache.org/jira/browse/ARROW-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411555#comment-17411555
]
Wes McKinney edited comment on ARROW-3998 at 9/7/21, 9:52 PM:
--------------------------------------------------------------
DuckDB provides TPC-H dataset generation as an extension and can generate the
datasets at different scale factors (see
https://github.com/duckdb/duckdb/tree/6c7c9805fdf1604039ebed47d233ea55cabb4b2c/extension/tpch).
Given that DuckDB can return result sets as Arrow format in Python and R, we
could use it as a utility to generate testing files
was (Author: wesmckinn):
DuckDB provides TPC-H dataset generation as an extension and can generate the
datasets at different scale factors. Given that DuckDB can return result sets
as Arrow format in Python and R, we could use it as a utility to generate
testing files
> Support TPC-H dbgen in Arrow
> ----------------------------
>
> Key: ARROW-3998
> URL: https://issues.apache.org/jira/browse/ARROW-3998
> Project: Apache Arrow
> Issue Type: Wish
> Components: Benchmarking, Integration
> Reporter: Francois Saint-Jacques
> Priority: Minor
>
> Integration tests and benchmarks should read TPC-H data. This is going to be
> useful for future query execution engine benchmarking.
> It could also attract researchers.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)