[
https://issues.apache.org/jira/browse/SPARK-39568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739439#comment-17739439
]
Kumar Nagaraj commented on SPARK-39568:
---------------------------------------
Thank you for reporting [~gokulyc]. Adding an example to recreate the issue
without the need for loading the file.
{code:java}
import pyspark.pandas as ps
import pandas as pd
from pyspark.sql import SparkSession
d = { 'col' : [ "1", "2", None] }
df = ps.DataFrame(data=d, columns=['col'])
int_df = df.astype("int")
str_df = df.astype("str")
int_df.isnull().sum()
#output is
#col 1
#dtype: int64
str_df.isnull().sum()
#output is
#col 0
#dtype: int64{code}
> when using df.astype("str") on pyspark dataframe. None are converted "None"
> ---------------------------------------------------------------------------
>
> Key: SPARK-39568
> URL: https://issues.apache.org/jira/browse/SPARK-39568
> Project: Spark
> Issue Type: Improvement
> Components: Pandas API on Spark
> Affects Versions: 3.3.0
> Environment: Tried on azure databricks.
> Reporter: Gokul Yalavarti
> Priority: Major
> Attachments: chrome_M3JLsVzzb2.png,
> image-2022-06-23-18-33-42-324.png, loan200 - Copy.csv
>
>
> when using df.astype("str") on pyspark dataframe. None are converted "None".
>
> - Not able to keep Null values as None. instead it is converted to string
> "None" for whole column.
>
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/9431837223032/3368288770184753/3794760114602748/latest.html
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]