[jira] [Updated] (SPARK-41945) Python: connect client lost column data with pyarrow.Table.to_pylist

jiaan.geng (Jira) Sun, 08 Jan 2023 21:34:05 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-41945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


jiaan.geng updated SPARK-41945:
-------------------------------
    Description: 
Python: connect client should not use pyarrow.Table.to_pylist to transform 
fetched data.
For example:
the data in pyarrow.Table show below.

{code:java}
key: string
order: int64
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE 
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE 
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
----
key: [["a","a","a","a","a","b","b"]]
order: [[0,1,2,3,4,1,2]]
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE 
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE 
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): 
[[null,null,"y","y","y",null,null]]
{code}

The table have five columns show above.
But the data after call pyarrow.Table.to_pylist() show below.

{code:java}
Row(key='a', order=0, nth_value(value, 2) OVER (PARTITION BY key ORDER BY order 
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None, 
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None),
{code}

There are only four columns left.


  was:
Python: connect client should not use pyarrow.Table.to_pylist to transform 
fetched data.
For example:
the data in pyarrow.Table show below.

But the data after call pyarrow.Table.to_pylist() show below.

{code:java}
Row(key='a', order=0, nth_value(value, 2) OVER (PARTITION BY key ORDER BY order 
ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None, 
nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)=None),
{code}



> Python: connect client lost column data with pyarrow.Table.to_pylist
> --------------------------------------------------------------------
>
>                 Key: SPARK-41945
>                 URL: https://issues.apache.org/jira/browse/SPARK-41945
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Connect
>    Affects Versions: 3.4.0
>            Reporter: jiaan.geng
>            Priority: Major
>
> Python: connect client should not use pyarrow.Table.to_pylist to transform 
> fetched data.
> For example:
> the data in pyarrow.Table show below.
> {code:java}
> key: string
> order: int64
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST 
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST 
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
> nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
> NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
> ----
> key: [["a","a","a","a","a","b","b"]]
> order: [[0,1,2,3,4,1,2]]
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST 
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): 
> [[null,"x","x","x","x",null,null]]
> nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST 
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): 
> [[null,"x","x","x","x",null,null]]
> nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC 
> NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): 
> [[null,null,"y","y","y",null,null]]
> {code}
> The table have five columns show above.
> But the data after call pyarrow.Table.to_pylist() show below.
> {code:java}
> Row(key='a', order=0, nth_value(value, 2) OVER (PARTITION BY key ORDER BY 
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT 
> ROW)=None, nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY 
> order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT 
> ROW)=None),
> {code}
> There are only four columns left.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-41945) Python: connect client lost column data with pyarrow.Table.to_pylist

Reply via email to