[ 
https://issues.apache.org/jira/browse/ARROW-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eduardo Ponce updated ARROW-7394:
---------------------------------
    Fix Version/s: 8.0.0
                       (was: 7.0.0)

> [C++][DataFrame] Implement zero-copy optimizations when performing Filter
> -------------------------------------------------------------------------
>
>                 Key: ARROW-7394
>                 URL: https://issues.apache.org/jira/browse/ARROW-7394
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Eduardo Ponce
>            Priority: Major
>              Labels: dataframe
>             Fix For: 8.0.0
>
>
> For high-selectivity filters (most elements included), it may be wasteful and 
> slow to copy large contiguous ranges of array chunks into the resulting 
> ChunkedArray. Instead, we can scan the filter boolean array and slice off 
> chunks of the source data rather than copying. 
> We will need to empirically determine how large the contiguous range needs to 
> be in order to merit the slice-based approach versus simple/native 
> materialization. For example, in a filter array like
> 1 0 1 0 1 0 1 0 1
> it would not make sense to slice 5 times because slicing carries some 
> overhead. But if we had
> 1 ... 1 [100 1's] 0 1 ... 1 [100 1's] 0 1 ... 1 [100 1's] 0 1 ... 1 [100 1's] 
> then performing 4 slices may be faster than doing a copy materialization. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to