Hi Shivaram, Thanks for the details, it is greatly appreciated. Thanks
On Wed, May 27, 2015 at 7:25 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > Sorry for the delay in getting back on this. So the RDD interface is > private in the 1.4 release but as Alek mentioned you can still use it by > prefixing `SparkR:::`. > > Regarding design direction -- there are two JIRAs which cover major > features we plan to work on for 1.5. SPARK-6805 tracks porting high-level > machine learning operations like `glm` and `kmeans` to SparkR using the ML > Pipeline implementation in Scala as the backend. > > We are also planning to develop a parallel API where users can run native > R functions in a distributed setting and SPARK-7264 tracks this effort. If > you have specific use cases feel free to chime in on the JIRA or on the dev > mailing list. > > Thanks > Shivaram > > On Tue, May 26, 2015 at 11:40 AM, Reynold Xin <r...@databricks.com> wrote: > >> You definitely don't want to implement kmeans in R, since it would be >> very slow. Just providing R wrappers for the MLlib implementation is the >> way to go. I believe one of the major items in SparkR next is the MLlib >> wrappers. >> >> >> >> On Tue, May 26, 2015 at 7:46 AM, Andrew Psaltis <psaltis.and...@gmail.com >> > wrote: >> >>> Hi Alek, >>> Thanks for the info. You are correct ,that using the three colons does >>> work. Admittedly I am a R novice, but since the three colons is used to >>> access hidden methods, it seems pretty dirty. >>> >>> Can someone shed light on the design direction being taken with SparkR? >>> Should I really be accessing hidden methods or will better approach >>> prevail? For instance, it feels like the k-means sample should really use >>> MLlib and not just be a port the k-means sample using hidden methods. Am I >>> looking at this incorrectly? >>> >>> Thanks, >>> Andrew >>> >>> On Tue, May 26, 2015 at 6:56 AM, Eskilson,Aleksander < >>> alek.eskil...@cerner.com> wrote: >>> >>>> From the changes to the namespace file, that appears to be correct, >>>> all methods of the RDD API have been made private, which in R means that >>>> you may still access them by using the namespace prefix SparkR with three >>>> colons, e.g. SparkR:::func(foo, bar). >>>> >>>> So a starting place for porting old SparkR scripts from before the >>>> merge could be to identify those methods in the script belonging to the RDD >>>> class and be sure they have the namespace identifier tacked on the front. I >>>> hope that helps. >>>> >>>> Regards, >>>> Alek Eskilson >>>> >>>> From: Andrew Psaltis <psaltis.and...@gmail.com> >>>> Date: Monday, May 25, 2015 at 6:25 PM >>>> To: "dev@spark.apache.org" <dev@spark.apache.org> >>>> Subject: SparkR and RDDs >>>> >>>> Hi, >>>> I understand from SPARK-6799[1] and the respective merge commit [2] >>>> that the RDD class is private in Spark 1.4 . If I wanted to modify the old >>>> Kmeans and/or LR examples so that the computation happened in Spark what is >>>> the best direction to go? Sorry if I am missing something obvious, but >>>> based on the NAMESPACE file [3] in the SparkR codebase I am having trouble >>>> seeing the obvious direction to go. >>>> >>>> Thanks in advance, >>>> Andrew >>>> >>>> [1] https://issues.apache.org/jira/browse/SPARK-6799 >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D6799&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=T9sfWUgCtxLUJ9F4B-MAmBhrH4e3aGvb_hbrENoIKho&s=bawjeA3Y9me3xXGxKghL4_dlf7vHdFHtiV5IhMlOmtc&e=> >>>> [2] >>>> https://github.com/apache/spark/commit/4b91e18d9b7803dbfe1e1cf20b46163d8cb8716c >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_commit_4b91e18d9b7803dbfe1e1cf20b46163d8cb8716c&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=T9sfWUgCtxLUJ9F4B-MAmBhrH4e3aGvb_hbrENoIKho&s=Hc7ijtxcnrZ7wSOStlz0-BHH-rUXSFowCpJuNGYu5eo&e=> >>>> [3] https://github.com/apache/spark/blob/branch-1.4/R/pkg/NAMESPACE >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_blob_branch-2D1.4_R_pkg_NAMESPACE&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=T9sfWUgCtxLUJ9F4B-MAmBhrH4e3aGvb_hbrENoIKho&s=l64LUOvbJ53qsVYphkYJ7_kbNptBdEhsSRSWBg5zqn8&e=> >>>> >>>> CONFIDENTIALITY NOTICE This message and any included attachments >>>> are from Cerner Corporation and are intended only for the addressee. The >>>> information contained in this message is confidential and may constitute >>>> inside or non-public information under international, federal, or state >>>> securities laws. Unauthorized forwarding, printing, copying, distribution, >>>> or use of such information is strictly prohibited and may be unlawful. If >>>> you are not the addressee, please promptly delete this message and notify >>>> the sender of the delivery error by e-mail or you may call Cerner's >>>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024 >>>> . >>>> >>> >>> >> >