[
https://issues.apache.org/jira/browse/SPARK-41945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
jiaan.geng updated SPARK-41945:
-------------------------------
Description:
Python: connect client should not use pyarrow.Table.to_pylist to transform
fetched data.
For example:
the data in pyarrow.Table show below.
{code:java}
pyarrow.Table
key: string
order: int64
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
----
key: [["a","a","a","a","a","b","b"]]
order: [[0,1,2,3,4,1,2]]
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):
[[null,null,"y","y","y",null,null]]
{code}
The table have five columns show above.
But the data after call pyarrow.Table.to_pylist() show below.
{code:java}
[{
'key': 'a',
'order': 0,
'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None,
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
}, {
'key': 'a',
'order': 1,
'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
}, {
'key': 'a',
'order': 2,
'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'
}, {
'key': 'a',
'order': 3,
'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'
}, {
'key': 'a',
'order': 4,
'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'
}, {
'key': 'b',
'order': 1,
'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None,
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
}, {
'key': 'b',
'order': 2,
'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None,
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
}]
{code}
There are only four columns left.
was:
Python: connect client should not use pyarrow.Table.to_pylist to transform
fetched data.
For example:
the data in pyarrow.Table show below.
{code:java}
pyarrow.Table
key: string
order: int64
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
----
key: [["a","a","a","a","a","b","b"]]
order: [[0,1,2,3,4,1,2]]
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):
[[null,null,"y","y","y",null,null]]
{code}
The table have five columns show above.
But the data after call pyarrow.Table.to_pylist() show below.
{code:java}
[{'key': 'a', 'order': 0, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY
order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)':
None, 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None},
{'key': 'a', 'order': 1, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY
order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None}, {'key':
'a', 'order': 2, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'}, {'key':
'a', 'order': 3, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'}, {'key':
'a', 'order': 4, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'}, {'key':
'b', 'order': 1, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None,
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None}, {'key':
'b', 'order': 2, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None,
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None}]
{code}
There are only four columns left.
> Python: connect client lost column data with pyarrow.Table.to_pylist
> --------------------------------------------------------------------
>
> Key: SPARK-41945
> URL: https://issues.apache.org/jira/browse/SPARK-41945
> Project: Spark
> Issue Type: Sub-task
> Components: Connect
> Affects Versions: 3.4.0
> Reporter: jiaan.geng
> Priority: Major
>
> Python: connect client should not use pyarrow.Table.to_pylist to transform
> fetched data.
> For example:
> the data in pyarrow.Table show below.
> {code:java}
> pyarrow.Table
> key: string
> order: int64
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
> nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
> NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
> ----
> key: [["a","a","a","a","a","b","b"]]
> order: [[0,1,2,3,4,1,2]]
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):
> [[null,"x","x","x","x",null,null]]
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):
> [[null,"x","x","x","x",null,null]]
> nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
> NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):
> [[null,null,"y","y","y",null,null]]
> {code}
> The table have five columns show above.
> But the data after call pyarrow.Table.to_pylist() show below.
> {code:java}
> [{
> 'key': 'a',
> 'order': 0,
> 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
> FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None,
> 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
> }, {
> 'key': 'a',
> 'order': 1,
> 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
> FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
> 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
> }, {
> 'key': 'a',
> 'order': 2,
> 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
> FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
> 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'
> }, {
> 'key': 'a',
> 'order': 3,
> 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
> FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
> 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'
> }, {
> 'key': 'a',
> 'order': 4,
> 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
> FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
> 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'
> }, {
> 'key': 'b',
> 'order': 1,
> 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
> FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None,
> 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
> }, {
> 'key': 'b',
> 'order': 2,
> 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS
> FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None,
> 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
> }]
> {code}
> There are only four columns left.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]