[
https://issues.apache.org/jira/browse/SPARK-41945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
jiaan.geng updated SPARK-41945:
-------------------------------
Description:
Python: connect client should not use pyarrow.Table.to_pylist to transform
fetched data.
For example:
the data in pyarrow.Table show below.
{code:java}
key: string
order: int64
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
----
key: [["a","a","a","a","a","b","b"]]
order: [[0,1,2,3,4,1,2]]
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):
[[null,null,"y","y","y",null,null]]
{code}
The table have five columns show above.
But the data after call pyarrow.Table.to_pylist() show below.
{code:java}
Row(key='a', order=0, nth_value(value, 2) OVER (PARTITION BY key ORDER BY order
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None,
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None),
{code}
There are only four columns left.
was:
Python: connect client should not use pyarrow.Table.to_pylist to transform
fetched data.
For example:
the data in pyarrow.Table show below.
But the data after call pyarrow.Table.to_pylist() show below.
{code:java}
Row(key='a', order=0, nth_value(value, 2) OVER (PARTITION BY key ORDER BY order
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None,
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None),
{code}
> Python: connect client lost column data with pyarrow.Table.to_pylist
> --------------------------------------------------------------------
>
> Key: SPARK-41945
> URL: https://issues.apache.org/jira/browse/SPARK-41945
> Project: Spark
> Issue Type: Sub-task
> Components: Connect
> Affects Versions: 3.4.0
> Reporter: jiaan.geng
> Priority: Major
>
> Python: connect client should not use pyarrow.Table.to_pylist to transform
> fetched data.
> For example:
> the data in pyarrow.Table show below.
> {code:java}
> key: string
> order: int64
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
> nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
> NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
> ----
> key: [["a","a","a","a","a","b","b"]]
> order: [[0,1,2,3,4,1,2]]
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):
> [[null,"x","x","x","x",null,null]]
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):
> [[null,"x","x","x","x",null,null]]
> nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC
> NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):
> [[null,null,"y","y","y",null,null]]
> {code}
> The table have five columns show above.
> But the data after call pyarrow.Table.to_pylist() show below.
> {code:java}
> Row(key='a', order=0, nth_value(value, 2) OVER (PARTITION BY key ORDER BY
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT
> ROW)=None, nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT
> ROW)=None),
> {code}
> There are only four columns left.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]