[ 
https://issues.apache.org/jira/browse/SPARK-39568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739439#comment-17739439
 ] 

Kumar Nagaraj commented on SPARK-39568:
---------------------------------------

Thank you for reporting [~gokulyc]. Adding an example to recreate the issue 
without the need for loading the file.

 
{code:java}
import pyspark.pandas as ps
import pandas as pd
from pyspark.sql import SparkSession

d = { 'col' : [ "1", "2", None] }
df = ps.DataFrame(data=d, columns=['col'])

int_df = df.astype("int")
str_df = df.astype("str")

int_df.isnull().sum()
#output is
#col    1
#dtype: int64

str_df.isnull().sum()
#output is
#col    0
#dtype: int64{code}

> when using df.astype("str") on pyspark dataframe. None are converted "None"
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-39568
>                 URL: https://issues.apache.org/jira/browse/SPARK-39568
>             Project: Spark
>          Issue Type: Improvement
>          Components: Pandas API on Spark
>    Affects Versions: 3.3.0
>         Environment: Tried on azure databricks.
>            Reporter: Gokul Yalavarti
>            Priority: Major
>         Attachments: chrome_M3JLsVzzb2.png, 
> image-2022-06-23-18-33-42-324.png, loan200 - Copy.csv
>
>
> when using df.astype("str") on pyspark dataframe. None are converted "None".
>  
>  - Not able to keep Null values as None. instead it is converted to string 
> "None" for whole column.
>  
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/9431837223032/3368288770184753/3794760114602748/latest.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to