[ https://issues.apache.org/jira/browse/SPARK-29316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-29316. ---------------------------------- Resolution: Won't Fix > CLONE - schemaInference option not to convert strings with leading zeros to > int/long > ------------------------------------------------------------------------------------- > > Key: SPARK-29316 > URL: https://issues.apache.org/jira/browse/SPARK-29316 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.0, 2.1.1, 2.2.0, 2.3.0 > Reporter: Ambar Raghuvanshi > Priority: Critical > Labels: csv, csvparser, easy-fix, inference, ramp-up, schema > > It would be great to have an option in Spark's schema inference to *not* to > convert to int/long datatype a column that has leading zeros. Think zip > codes, for example. > {code:java} > df = (sqlc.read.format('csv') > .option('inferSchema', True) > .option('header', True) > .option('delimiter', '|') > .option('leadingZeros', 'KEEP') # this is the new > proposed option > .option('mode', 'FAILFAST') > .load('csvfile_withzipcodes_to_ingest.csv') > ) > The general usage of data with trailing 0 is for Identifiers. If they are > converted to int/long defeats the purpose of inferSchema. The conversion > should be provided on the basis of a flag whether the data should be > converted to int/long or not. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org