[ https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745146#comment-16745146 ]
Marco Gaido commented on SPARK-26645: ------------------------------------- The error is on python side, I will submit a PR shortly, thanks for reporting this. > CSV infer schema bug infers decimal(9,-1) > ----------------------------------------- > > Key: SPARK-26645 > URL: https://issues.apache.org/jira/browse/SPARK-26645 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Ohad Raviv > Priority: Minor > > we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9". > running: > {code:python} > df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t') > print df.dtypes > {code} > causes: > {noformat} > ValueError: Could not parse datatype: decimal(9,-1) > {noformat} > I'm not sure where the bug is - inferSchema or dtypes? > I saw it is legal to have a decimal with negative scale in the code > (CSVInferSchema.scala): > {code:python} > if (bigDecimal.scale <= 0) { > // `DecimalType` conversion can fail when > // 1. The precision is bigger than 38. > // 2. scale is bigger than precision. > DecimalType(bigDecimal.precision, bigDecimal.scale) > } > {code} > but what does it mean? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org