[ 
https://issues.apache.org/jira/browse/PARQUET-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382390#comment-17382390
 ] 

Weston Pace commented on PARQUET-2068:
--------------------------------------

I'm not sure if this is a better fit for the Arrow project.  I chose this 
simply because the current implementation lives in parquet/... and not 
arrow/... or parquet/arrow/...

> [C++] [Parquet] Use arrow compute to determine min/max of dictionaries 
> (possibly other arrays?)
> -----------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2068
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2068
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Weston Pace
>            Priority: Major
>
> parquet::Comparator is currently used to calculate the min & max values of an 
> array.  This should be benchmarked against arrow::compute's MinMax kernel 
> (once it supports all necessary data types).  The latter should be more 
> aggressive with SIMD resulting in better performance.
> Even if there is no performance difference the MinMax kernel should be used 
> when computing dictionary statistics as the current implementation requires 
> making a copy of the dictionary values array (see ARROW-12513)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to