[ 
https://issues.apache.org/jira/browse/SPARK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060381#comment-15060381
 ] 

Shivaram Venkataraman commented on SPARK-12360:
-----------------------------------------------

The lack of 64 bit numbers is a limitation in R, but I'd like to understand the 
use-cases where this comes up before trying a complex fix. My understanding is 
that long values from JSON / HDFS / Parquet etc. will be read correctly because 
they go through the Scala layers and the problem only comes up when somebody 
does a collect / UDF ? If so I think the problem may not be that important as R 
users probably wouldn't expect long types to work on the R shell. 

Also it might lead to another solution where we don't add a dependency on 
bit64, but we check if bit64 is available and if so we avoid the truncation to 
double etc.

> Support using 64-bit long type in SparkR
> ----------------------------------------
>
>                 Key: SPARK-12360
>                 URL: https://issues.apache.org/jira/browse/SPARK-12360
>             Project: Spark
>          Issue Type: New Feature
>          Components: SparkR
>    Affects Versions: 1.5.2
>            Reporter: Sun Rui
>
> R has no support for 64-bit integers. While in Scala/Java API, some methods 
> have one or more arguments of long type. Currently we support only passing an 
> integer cast from a numeric to Scala/Java side for parameters of long type of 
> such methods. This may have problem covering large data sets.
> Storing a 64-bit integer in a double obviously does not work as some 64-bit 
> integers can not be exactly represented in double format, so x and x+1 can't 
> be distinguished.
> There is a bit64 package 
> (https://cran.r-project.org/web/packages/bit64/index.html) in CRAN which 
> supports vectors of 64-bit integers. We can investigate if it can be used for 
> this purpose.
> two questions are:
> 1. Is the license acceptable?
> 2. This will have SparkR depends on a  non-base third-party package, which 
> may complicate the deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to