[ 
https://issues.apache.org/jira/browse/ARROW-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Molina updated ARROW-16518:
--------------------------------------
    Description: 
At the moment execplan doesn't guarantee any ordered output, the batches are 
consumed in a random order. This can lead to unordered rows in outputs when 
{{use_threads=True}}

For example providing a column with {{b=[a, a, a, a, b, b, b, b]}} will 
sometimes give back {{b=[a, b]}} and sometimes {{b=[b, a]}}

See
{code:java}
In [18]: table1 = pa.table({'a': [1, 2, 3, 4], 'b': ['a'] * 4})

In [19]: table2 = pa.table({'a': [1, 2, 3, 4], 'b': ['b'] * 4})

In [20]: table = pa.concat_tables([table1, table2])

In [21]: ep._filter_table(table, pc.field('a') == 1)
Out[21]: 
pyarrow.Table
a: int64
b: string
----
a: [[1],[1]]
b: [["b"],["a"]]

In [22]: ep._filter_table(table, pc.field('a') == 1)
Out[22]: 
pyarrow.Table
a: int64
b: string
----
a: [[1],[1]]
b: [["a"],["b"]] {code}

  was:
At the moment execplan doesn't guarantee any ordered output, the batches are 
consumed in a random order. This can lead to unordered outputs when 
{{use_threads=True}}

See
{code:java}
In [18]: table1 = pa.table({'a': [1, 2, 3, 4], 'b': ['a'] * 4})

In [19]: table2 = pa.table({'a': [1, 2, 3, 4], 'b': ['b'] * 4})

In [20]: table = pa.concat_tables([table1, table2])

In [21]: ep._filter_table(table, pc.field('a') == 1)
Out[21]: 
pyarrow.Table
a: int64
b: string
----
a: [[1],[1]]
b: [["b"],["a"]]

In [22]: ep._filter_table(table, pc.field('a') == 1)
Out[22]: 
pyarrow.Table
a: int64
b: string
----
a: [[1],[1]]
b: [["a"],["b"]] {code}


> [Python] Ensure _exec_plan.execplan preserves order of inputs
> -------------------------------------------------------------
>
>                 Key: ARROW-16518
>                 URL: https://issues.apache.org/jira/browse/ARROW-16518
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: Python
>            Reporter: Alessandro Molina
>            Priority: Major
>             Fix For: 9.0.0
>
>
> At the moment execplan doesn't guarantee any ordered output, the batches are 
> consumed in a random order. This can lead to unordered rows in outputs when 
> {{use_threads=True}}
> For example providing a column with {{b=[a, a, a, a, b, b, b, b]}} will 
> sometimes give back {{b=[a, b]}} and sometimes {{b=[b, a]}}
> See
> {code:java}
> In [18]: table1 = pa.table({'a': [1, 2, 3, 4], 'b': ['a'] * 4})
> In [19]: table2 = pa.table({'a': [1, 2, 3, 4], 'b': ['b'] * 4})
> In [20]: table = pa.concat_tables([table1, table2])
> In [21]: ep._filter_table(table, pc.field('a') == 1)
> Out[21]: 
> pyarrow.Table
> a: int64
> b: string
> ----
> a: [[1],[1]]
> b: [["b"],["a"]]
> In [22]: ep._filter_table(table, pc.field('a') == 1)
> Out[22]: 
> pyarrow.Table
> a: int64
> b: string
> ----
> a: [[1],[1]]
> b: [["a"],["b"]] {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to