[jira] [Comment Edited] (SPARK-23576) SparkSQL - Decimal data missing decimal point

Hafthor Stefansson (JIRA) Fri, 25 May 2018 14:28:12 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-23576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491267#comment-16491267
 ]


Hafthor Stefansson edited comment on SPARK-23576 at 5/25/18 9:27 PM:
---------------------------------------------------------------------

Here's an equivalent problem:

spark.sql("select cast(1 as decimal(38,18)) as 
x").write.format("parquet").save("decimal.parq")

spark.read.schema(spark.sql("select cast(1 as decimal) as 
x").schema).parquet("decimal.parq").show

returns 1000000000000000000!

It should throw, like it would if I specified a schema with x as float, or some 
other type.

Or maybe do what double casting would have

spark.sql("select cast(cast(1 as decimal(38,10)) as decimal(38,18)) as x").show

returns 1.000000000000000000

except, I'd be worried about getting nulls when exceeding the range

spark.sql("select cast(cast(10 as decimal(2,0)) as decimal(2,1)) as x").show

returns null!

[https://gist.github.com/Hafthor/7f12bdfc41dc96676df03f366ef76f1c]


was (Author: hafthor):
Here's an equivalent problem:

spark.sql("select cast(1 as decimal(38,18)) as 
x").write.format("parquet").save("decimal.parq")

spark.read.schema(spark.sql("select cast(1 as decimal) as 
x").schema).parquet("decimal.parq").show

returns 1000000000000000000!

It should throw, like it would if I specified a schema with x as float, or some 
other type.

Or maybe do what double casting would have

spark.sql("select cast(cast(1 as decimal(38,10)) as decimal(38,18)) as x").show

returns 1.000000000000000000

spark.sql("select cast(cast(10 as decimal(2,0)) as decimal(2,1)) as x").show

returns null!

[https://gist.github.com/Hafthor/7f12bdfc41dc96676df03f366ef76f1c]

> SparkSQL - Decimal data missing decimal point
> ---------------------------------------------
>
>                 Key: SPARK-23576
>                 URL: https://issues.apache.org/jira/browse/SPARK-23576
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>         Environment: spark 2.3.0
> linux
>            Reporter: R
>            Priority: Major
>
> Integers like 3 stored as a decimal display in sparksql as 30000000000 with 
> no decimal point. But hive displays fine as 3.
> Repro steps:
>  # Create a .csv with the value 3
>  # Use spark to read the csv, cast it as decimal(31,8) and output to an ORC 
> file
>  # Use spark to read the ORC, infer the schema (it will infer 38,18 
> precision) and output to a Parquet file
>  # Create external hive table to read the parquet ( define the hive type as 
> decimal(31,8))
>  # Use spark-sql to select from the external hive table.
>  # Notice how sparksql shows 30000000000    !!!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23576) SparkSQL - Decimal data missing decimal point

Reply via email to