The  simple answer is that SparkR does support map/reduce operations over RDD’s 
through the RDD API, but since Spark v 1.4.0, those functions were made private 
in SparkR. They can still be accessed by prepending the function with the 
namespace, like SparkR:::lapply(rdd, func). It was thought though that many of 
the functions in the RDD API were too low level to expose, with much more of 
the focus going into the DataFrame API. The original rationale for this 
decision can be found in its JIRA [1]. The devs are still deciding which 
functions of the RDD API, if any, should be made public for future releases. If 
you feel some use cases are most easily handled in SparkR through RDD 
functions, go ahead and let the dev email list know.

Alek
[1] -- https://issues.apache.org/jira/browse/SPARK-7230

From: Wei Zhou <zhweisop...@gmail.com<mailto:zhweisop...@gmail.com>>
Date: Wednesday, June 24, 2015 at 4:59 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: How to Map and Reduce in sparkR

Anyone knows whether sparkR supports map and reduce operations as the RDD 
transformations? Thanks in advance.

Best,
Wei

CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.

Reply via email to