[ https://issues.apache.org/jira/browse/SPARK-40934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629542#comment-17629542 ]
Dustin Smith commented on SPARK-40934: -------------------------------------- What should be the desired behavior of a this date column when not parsed? Should it be converted to string type? > pyspark.pandas.read_csv parses dates, but docs state otherwise > -------------------------------------------------------------- > > Key: SPARK-40934 > URL: https://issues.apache.org/jira/browse/SPARK-40934 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark > Affects Versions: 3.3.1 > Reporter: Stefaan Lippens > Priority: Major > > from > [https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.read_csv.html] > : > {quote}parse_dates: > boolean or list of ints or names or list of lists or dict, default False. > Currently only False is allowed. > {quote} > This documentation suggests that dates are never parsed, but apparently they > are always parsed (and it can not be disabled): > {code:python} > import pyspark.pandas > df = pyspark.pandas.read_csv("data.csv", parse_dates=False) > print(df) > print(df.dtypes) > {code} > with this data > {code:java} > date,feature_index,band_0,band_1,band_2 > 2021-01-05T01:00:00.000+01:00,2,5.0,4.5,3.75 > 2021-01-05T01:00:00.000+01:00,0,5.0,1.0,2.25 > 2021-01-05T01:00:00.000+01:00,1,5.0,3.5,4.0 > 2021-01-15T01:00:00.000+01:00,2,15.0,4.5,3.75 > 2021-01-15T01:00:00.000+01:00,0,15.0,1.0,2.25 > {code} > gives > {code:java} > date feature_index band_0 band_1 band_2 > 0 2021-01-05 01:00:00 2 5.0 4.5 3.75 > 1 2021-01-05 01:00:00 0 5.0 1.0 2.25 > 2 2021-01-05 01:00:00 1 5.0 3.5 4.00 > 3 2021-01-15 01:00:00 2 15.0 4.5 3.75 > 4 2021-01-15 01:00:00 0 15.0 1.0 2.25 > date datetime64[ns] > feature_index int32 > band_0 float64 > band_1 float64 > band_2 float64 > dtype: object > {code} > Notice how the dates are parsed (e.g. dtype {{datetime64[ns]}} for {{date}}) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org