[
https://issues.apache.org/jira/browse/ARROW-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411555#comment-17411555
]
Wes McKinney commented on ARROW-3998:
-------------------------------------
DuckDB provides TPC-H dataset generation as an extension and can generate the
datasets at different scale factors. Given that DuckDB can return result sets
as Arrow format in Python and R, we could use it as a utility to generate
testing files
> Support TPC-H dbgen in Arrow
> ----------------------------
>
> Key: ARROW-3998
> URL: https://issues.apache.org/jira/browse/ARROW-3998
> Project: Apache Arrow
> Issue Type: Wish
> Components: Benchmarking, Integration
> Reporter: Francois Saint-Jacques
> Priority: Minor
>
> Integration tests and benchmarks should read TPC-H data. This is going to be
> useful for future query execution engine benchmarking.
> It could also attract researchers.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)