Felix Cheung commented on SPARK-17608:

This is in fact problematic - R base supports integer in 32-bit only, so there 
isn't really a good way to represent bigint fully in R without bringing in 
external packages.

I think we are doing our best by converting it into numeric in R, but it is 
correct that we are having problem with roundtripping (JVM<->R) and also there 
is a loss of precision too.

We discussed this earlier (in 
https://issues.apache.org/jira/browse/SPARK-12360) and generally felt string 
might be a better approach. However, converting bigint into string (character) 
in R would not solve the roundtripping issue either. Also an integer value in 
string form might be unexpected and harder to work with in R.

> Long type has incorrect serialization/deserialization
> -----------------------------------------------------
>                 Key: SPARK-17608
>                 URL: https://issues.apache.org/jira/browse/SPARK-17608
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.0.0
>            Reporter: Thomas Powell
> Am hitting issues when using {{dapply}} on a data frame that contains a 
> {{bigint}} in its schema. When this is converted to a SparkR data frame a 
> "bigint" gets converted to a R {{numeric}} type: 
> https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L25.
> However, the R {{numeric}} type gets converted to 
> {{org.apache.spark.sql.types.DoubleType}}: 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L97.
> The two directions therefore aren't compatible. If I use the same schema when 
> using dapply (and just an identity function) I will get type collisions 
> because the output type is a double but the schema expects a bigint. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to