[ https://issues.apache.org/jira/browse/SPARK-23576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491267#comment-16491267 ]
Hafthor Stefansson edited comment on SPARK-23576 at 5/25/18 9:27 PM: --------------------------------------------------------------------- Here's an equivalent problem: spark.sql("select cast(1 as decimal(38,18)) as x").write.format("parquet").save("decimal.parq") spark.read.schema(spark.sql("select cast(1 as decimal) as x").schema).parquet("decimal.parq").show returns 1000000000000000000! It should throw, like it would if I specified a schema with x as float, or some other type. Or maybe do what double casting would have spark.sql("select cast(cast(1 as decimal(38,10)) as decimal(38,18)) as x").show returns 1.000000000000000000 except, I'd be worried about getting nulls when exceeding the range spark.sql("select cast(cast(10 as decimal(2,0)) as decimal(2,1)) as x").show returns null! [https://gist.github.com/Hafthor/7f12bdfc41dc96676df03f366ef76f1c] was (Author: hafthor): Here's an equivalent problem: spark.sql("select cast(1 as decimal(38,18)) as x").write.format("parquet").save("decimal.parq") spark.read.schema(spark.sql("select cast(1 as decimal) as x").schema).parquet("decimal.parq").show returns 1000000000000000000! It should throw, like it would if I specified a schema with x as float, or some other type. Or maybe do what double casting would have spark.sql("select cast(cast(1 as decimal(38,10)) as decimal(38,18)) as x").show returns 1.000000000000000000 spark.sql("select cast(cast(10 as decimal(2,0)) as decimal(2,1)) as x").show returns null! [https://gist.github.com/Hafthor/7f12bdfc41dc96676df03f366ef76f1c] > SparkSQL - Decimal data missing decimal point > --------------------------------------------- > > Key: SPARK-23576 > URL: https://issues.apache.org/jira/browse/SPARK-23576 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Environment: spark 2.3.0 > linux > Reporter: R > Priority: Major > > Integers like 3 stored as a decimal display in sparksql as 30000000000 with > no decimal point. But hive displays fine as 3. > Repro steps: > # Create a .csv with the value 3 > # Use spark to read the csv, cast it as decimal(31,8) and output to an ORC > file > # Use spark to read the ORC, infer the schema (it will infer 38,18 > precision) and output to a Parquet file > # Create external hive table to read the parquet ( define the hive type as > decimal(31,8)) > # Use spark-sql to select from the external hive table. > # Notice how sparksql shows 30000000000 !!! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org