[ https://issues.apache.org/jira/browse/SPARK-23612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Young updated SPARK-23612: ---------------------------------- Description: [https://github.com/apache/spark/blob/407f67249639709c40c46917700ed6dd736daa7d/python/pyspark/sql/types.py#L162-L200] It would be very helpful if it were possible to specify the format for individual columns in a schema when reading csv files, rather than one format: {code:java|title=Bar.python|borderStyle=solid} # Currently can only do something like: spark.read.option("dateFormat", "yyyyMMdd").csv(...) # Would like to be able to do something like: schema = StructType([ StructField("date1", DateType(format="MM/dd/yyyy"), True), StructField("date2", DateType(format="yyyyMMdd"), True) ] read.schema(schema).csv(...) {code} Thanks for any help, input! was: [https://github.com/apache/spark/blob/407f67249639709c40c46917700ed6dd736daa7d/python/pyspark/sql/types.py#L162-L200] It would be very helpful if it were possible to specify the format for individual columns in a schema when reading csv files, rather than one format: {code:title=Bar.python|borderStyle=solid} # Currently can only do something like: spark.read.option("**dateFormat", "yyyyMMdd").csv(...) # Would like to be able to do something like: schema = StructType([ StructField("date1", DateType(format="MM/dd/yyyy"), True), StructField("date2", DateType(format="yyyyMMdd"), True) ] read.schema(schema).csv(...) {{{code}}} > Specify formats for individual DateType and TimestampType columns in schemas > ---------------------------------------------------------------------------- > > Key: SPARK-23612 > URL: https://issues.apache.org/jira/browse/SPARK-23612 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL > Affects Versions: 2.3.0 > Reporter: Patrick Young > Priority: Minor > Labels: DataType, date, sql > > [https://github.com/apache/spark/blob/407f67249639709c40c46917700ed6dd736daa7d/python/pyspark/sql/types.py#L162-L200] > It would be very helpful if it were possible to specify the format for > individual columns in a schema when reading csv files, rather than one format: > {code:java|title=Bar.python|borderStyle=solid} > # Currently can only do something like: > spark.read.option("dateFormat", "yyyyMMdd").csv(...) > # Would like to be able to do something like: > schema = StructType([ > StructField("date1", DateType(format="MM/dd/yyyy"), True), > StructField("date2", DateType(format="yyyyMMdd"), True) > ] > read.schema(schema).csv(...) > {code} > Thanks for any help, input! -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org