Re: Do existing R packages work with SparkR data frames

2015-12-23 Thread Felix Cheung
Hi
SparkR has some support for machine learning algorithm like glm.
For existing R packages, currently you would need to collect to convert into R 
data.frame - assuming it fits into the memory of the driver node, though that 
would be required to work with R package in any case.



_
From: Lan 
Sent: Tuesday, December 22, 2015 4:50 PM
Subject: Do existing R packages work with SparkR data frames
To:  


   Hello,   

 Is it possible for existing R Machine Learning packages (which work with R   
 data frames) such as bnlearn, to work with SparkR data frames? Or do I need   
 to convert SparkR data frames to R data frames? Is "collect" the function to   
 do the conversion, or how else to do that?   

 Many Thanks,   
 Lan   



 --   
 View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Do-existing-R-packages-work-with-SparkR-data-frames-tp25772.html
   
 Sent from the Apache Spark User List mailing list archive at Nabble.com.   

 -   
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org   
 For additional commands, e-mail: user-h...@spark.apache.org   

   


  

RE: Do existing R packages work with SparkR data frames

2015-12-22 Thread Sun, Rui
Hi, Lan,

Generally, it is hard to use existing R packages working with R data frames to 
work with SparkR data frames transparently. Typically the algorithms have to be 
re-written to use SparkR DataFrame API.

Collect is for collecting the data from a SparkR DataFrame into a local 
data.frame. Since a SparkR DataFrame is a distributed data set, typically you 
call methods of SparkR DataFrame API to manipulate its data distributedly and 
after the result is enough to fit in the memory of local machine, you can 
collect it for local processing.

From: Duy Lan Nguyen [mailto:ndla...@gmail.com]
Sent: Wednesday, December 23, 2015 5:50 AM
To: user@spark.apache.org
Subject: Do existing R packages work with SparkR data frames

Hello,

Is it possible for existing R Machine Learning packages (which work with R data 
frames) such as bnlearn, to work with SparkR data frames? Or do I need to 
convert SparkR data frames to R data frames? Is "collect" the function to do 
the conversion, or how else to do that?

Many Thanks,
Lan