[ 
https://issues.apache.org/jira/browse/ARROW-13681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403949#comment-17403949
 ] 

Antoine Pitrou commented on ARROW-13681:
----------------------------------------

[~jorisvandenbossche] The results were wrong for the second chunk as they were 
indexed from the start of the chunk, rather than the start of the entire 
chunked array (think what happens if you call take() with the result indices).

> [C++] list_parent_indices only computes for first chunk
> -------------------------------------------------------
>
>                 Key: ARROW-13681
>                 URL: https://issues.apache.org/jira/browse/ARROW-13681
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Tor Eivind McKenzie-Syvertsen
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 6.0.0, 5.0.1
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Pyarrow version: 5.0.0. 
> Python version: 3.7.9
> I came across this issue due to very unexpected behaviour from the "explode" 
> function obtained here:
> https://issues.apache.org/jira/browse/ARROW-12099
>  indices = pc.list_parent_indices(table[col_name])
> if table[column] in this example contains several chunks, the indices will 
> look perfectly fine for that chunk, but erratic and unexpected results for 
> second chunk.
>  No warning or info was given either
> A workaround that solved the problem for me is:
> {code:java}
>   indices = pc.list_parent_indices(table.combine_chunks()[col_name])
> {code}
> The behaviour then changes dramatically.
> I'm assuming this isnt expected and should be fixed?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to