[ 
https://issues.apache.org/jira/browse/SPARK-16883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414416#comment-15414416
 ] 

Miao Wang commented on SPARK-16883:
-----------------------------------

I add some debug information in SerDe.scala, as shown below:
16/08/09 16:05:18 INFO SerDe: writeObject start
16/08/09 16:05:18 INFO SerDe: [x: double, y: decimal(10,0)]
16/08/09 16:05:18 INFO SerDe: writeObject end
16/08/09 16:05:18 INFO SerDe: writeType start
16/08/09 16:05:18 INFO SerDe: jobj
16/08/09 16:05:18 INFO SerDe: writeType end
16/08/09 16:05:18 INFO SerDe: writeObject start
16/08/09 16:05:18 INFO SerDe: StructType(StructField(x,DoubleType,true), 
StructField(y,DecimalType(10,0),true))
16/08/09 16:05:18 INFO SerDe: writeObject end
16/08/09 16:05:18 INFO SerDe: writeType start
16/08/09 16:05:18 INFO SerDe: jobj
16/08/09 16:05:18 INFO SerDe: writeType end

It serializes the value as a Java object, which is correct. The problem is in 
the frontend (sparkR).

To verify my guess, I simply add "decimal(10,0)" = "numeric" in the 
PRIMITIVE_TYPES mapping table. It works as below:
'data.frame':   5 obs. of  2 variables:
 $ x: num  1 1 1 1 1
 $ y: num  2 2 2 2 2

I think the right fix should be in the frontend by adding a handling function 
with regex to match special cases like decimal (10,0). In general, we should 
map decimal and decimal(x,y) to numeric. 

> SQL decimal type is not properly cast to number when collecting SparkDataFrame
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-16883
>                 URL: https://issues.apache.org/jira/browse/SPARK-16883
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.0.0
>            Reporter: Hossein Falaki
>
> To reproduce run following code. As you can see "y" is a list of values.
> {code}
> registerTempTable(createDataFrame(iris), "iris")
> str(collect(sql("select cast('1' as double) as x, cast('2' as decimal) as y  
> from iris limit 5")))
> 'data.frame': 5 obs. of  2 variables:
>  $ x: num  1 1 1 1 1
>  $ y:List of 5
>   ..$ : num 2
>   ..$ : num 2
>   ..$ : num 2
>   ..$ : num 2
>   ..$ : num 2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to