[
https://issues.apache.org/jira/browse/SPARK-33415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-33415.
----------------------------------
Target Version/s: 3.1.0
Assignee: Maciej Szymkiewicz
Resolution: Fixed
Fixed in https://github.com/apache/spark/pull/30322
> Column.__repr__ shouldn't encode JVM response
> ---------------------------------------------
>
> Key: SPARK-33415
> URL: https://issues.apache.org/jira/browse/SPARK-33415
> Project: Spark
> Issue Type: Improvement
> Components: PySpark, SQL
> Affects Versions: 3.1.0
> Reporter: Maciej Szymkiewicz
> Assignee: Maciej Szymkiewicz
> Priority: Minor
>
> At the moment PySpark {{Column}} {{encodes}} JVM response in {{__repr__}}
> method.
> As a result, column names using only ASCII characters get {{b}} prefix
> {code:python}
> >>> from pyspark.sql.functions import col
> >>>
> >>>
> >>> col("abc")
> >>>
> >>>
> Column<b'abc'>
> {code}
> and the others ugly byte string
> {code:python}
> >>> col("wąż")
> >>>
> >>>
> Column<b'w\xc4\x85\xc5\xbc'>
> {code}
> This behaviour is inconsistent with other parts of the API, for example:
> {code:python}
> >>> spark.createDataFrame([], "`wąż` long")
> >>>
> >>>
> DataFrame[wąż: bigint]
> {code}
> and Scala
> {code:scala}
> scala> col("wąż")
> res0: org.apache.spark.sql.Column = wąż
> {code}
> and R
> {code:r}
> > column("wąż")
> Column wąż
> {code}
> Encoding has been originally introduced with SPARK-5859, but it doesn't seem
> like it is really required.
> Desired behaviour
> {code:python}
> >>> col("wąż")
> >>>
> >>>
> Column<'wąż'>
> {code}
> or
> {code:python}
> >>> col("wąż")
> >>>
> >>>
> Column<wąż>
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]