[
https://issues.apache.org/jira/browse/PARQUET-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382390#comment-17382390
]
Weston Pace commented on PARQUET-2068:
--------------------------------------
I'm not sure if this is a better fit for the Arrow project. I chose this
simply because the current implementation lives in parquet/... and not
arrow/... or parquet/arrow/...
> [C++] [Parquet] Use arrow compute to determine min/max of dictionaries
> (possibly other arrays?)
> -----------------------------------------------------------------------------------------------
>
> Key: PARQUET-2068
> URL: https://issues.apache.org/jira/browse/PARQUET-2068
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-cpp
> Reporter: Weston Pace
> Priority: Major
>
> parquet::Comparator is currently used to calculate the min & max values of an
> array. This should be benchmarked against arrow::compute's MinMax kernel
> (once it supports all necessary data types). The latter should be more
> aggressive with SIMD resulting in better performance.
> Even if there is no performance difference the MinMax kernel should be used
> when computing dictionary statistics as the current implementation requires
> making a copy of the dictionary values array (see ARROW-12513)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)