[jira] [Resolved] (SPARK-33415) Column.repr shouldn't encode JVM response

Hyukjin Kwon (Jira) Wed, 11 Nov 2020 07:15:58 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-33415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon resolved SPARK-33415.
----------------------------------
    Target Version/s: 3.1.0
            Assignee: Maciej Szymkiewicz
          Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/30322

> Column.__repr__ shouldn't encode JVM response
> ---------------------------------------------
>
>                 Key: SPARK-33415
>                 URL: https://issues.apache.org/jira/browse/SPARK-33415
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 3.1.0
>            Reporter: Maciej Szymkiewicz
>            Assignee: Maciej Szymkiewicz
>            Priority: Minor
>
> At the moment PySpark {{Column}} {{encodes}} JVM response in {{__repr__}} 
> method.
> As a result, column names using only ASCII characters get {{b}} prefix
> {code:python}
> >>> from pyspark.sql.functions import col                                     
> >>>                                                                           
> >>>                                  
> >>> col("abc")                                                                
> >>>                                                                           
> >>>                                  
> Column<b'abc'>
> {code}
> and the others ugly byte string
> {code:python}
> >>> col("wąż")                                                                
> >>>                                                                           
> >>>                                  
> Column<b'w\xc4\x85\xc5\xbc'>
> {code}
> This behaviour is inconsistent with other parts of the API, for example:
> {code:python}
> >>> spark.createDataFrame([], "`wąż` long")                                   
> >>>                                                                           
> >>>                                  
> DataFrame[wąż: bigint]
> {code}
> and Scala
> {code:scala}
> scala> col("wąż")
> res0: org.apache.spark.sql.Column = wąż
> {code}
> and R
> {code:r}
> > column("wąż")
> Column wąż 
> {code}
> Encoding has been originally introduced with SPARK-5859, but it doesn't seem 
> like it is really required.
> Desired behaviour
> {code:python}
> >>> col("wąż")                                                                
> >>>                                                                           
> >>>                                  
> Column<'wąż'>
> {code}
> or
> {code:python}
> >>> col("wąż")                                                                
> >>>                                                                           
> >>>                                  
> Column<wąż>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-33415) Column.__repr__ shouldn't encode JVM response

Reply via email to

[jira] [Resolved] (SPARK-33415) Column.repr shouldn't encode JVM response