[ 
https://issues.apache.org/jira/browse/SPARK-41945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-41945:
-------------------------------
    Description: 
Python: connect client should not use pyarrow.Table.to_pylist to transform 
fetched data.
For example:
the data in pyarrow.Table show below.

{code:java}
[{'key': 'a', 'order': 0, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY 
order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 
None, 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order 
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None}, 
{'key': 'a', 'order': 1, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY 
order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x', 
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None}, {'key': 
'a', 'order': 2, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x', 
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'}, {'key': 
'a', 'order': 3, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x', 
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'}, {'key': 
'a', 'order': 4, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x', 
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'}, {'key': 
'b', 'order': 1, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None, 
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None}, {'key': 
'b', 'order': 2, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None, 
'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None}]
{code}

The table have five columns show above.
But the data after call pyarrow.Table.to_pylist() show below.

{code:java}
Row(key='a', order=0, nth_value(value, 2) OVER (PARTITION BY key ORDER BY order 
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None, 
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None),
{code}

There are only four columns left.


  was:
Python: connect client should not use pyarrow.Table.to_pylist to transform 
fetched data.
For example:
the data in pyarrow.Table show below.

{code:java}
key: string
order: int64
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE 
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE 
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
----
key: [["a","a","a","a","a","b","b"]]
order: [[0,1,2,3,4,1,2]]
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE 
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE 
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): 
[[null,null,"y","y","y",null,null]]
{code}

The table have five columns show above.
But the data after call pyarrow.Table.to_pylist() show below.

{code:java}
Row(key='a', order=0, nth_value(value, 2) OVER (PARTITION BY key ORDER BY order 
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None, 
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None),
{code}

There are only four columns left.



> Python: connect client lost column data with pyarrow.Table.to_pylist
> --------------------------------------------------------------------
>
>                 Key: SPARK-41945
>                 URL: https://issues.apache.org/jira/browse/SPARK-41945
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Connect
>    Affects Versions: 3.4.0
>            Reporter: jiaan.geng
>            Priority: Major
>
> Python: connect client should not use pyarrow.Table.to_pylist to transform 
> fetched data.
> For example:
> the data in pyarrow.Table show below.
> {code:java}
> [{'key': 'a', 'order': 0, 'nth_value(value, 2) OVER (PARTITION BY key ORDER 
> BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 
> None, 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order 
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None}, 
> {'key': 'a', 'order': 1, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY 
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 
> 'x', 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order 
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None}, 
> {'key': 'a', 'order': 2, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY 
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 
> 'x', 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order 
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'}, 
> {'key': 'a', 'order': 3, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY 
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 
> 'x', 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order 
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'}, 
> {'key': 'a', 'order': 4, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY 
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 
> 'x', 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order 
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'}, 
> {'key': 'b', 'order': 1, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY 
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 
> None, 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order 
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None}, 
> {'key': 'b', 'order': 2, 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY 
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 
> None, 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order 
> ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None}]
> {code}
> The table have five columns show above.
> But the data after call pyarrow.Table.to_pylist() show below.
> {code:java}
> Row(key='a', order=0, nth_value(value, 2) OVER (PARTITION BY key ORDER BY 
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT 
> ROW)=None, nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY 
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT 
> ROW)=None),
> {code}
> There are only four columns left.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to