Re: How to create empty RDD

2015-07-06 Thread Wei Zhou
I userd val output: RDD[(DetailInputRecord, VISummary)] = sc.emptyRDD[(DetailInputRecord, VISummary)] to create empty RDD before. Give it a try, it might work for you too. 2015-07-06 14:11 GMT-07:00 ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com: I need to return an empty RDD of type val output:

Re: sparkR could not find function textFile

2015-06-26 Thread Wei Zhou
. To see how this schema is declared, check out Hossein Falaki’s response in this thread [1]. — Alek [1] -- http://apache-spark-developers-list.1001551.n3.nabble.com/SparkR-DataFrame-Column-Casts-esp-from-CSV-Files-td12589.html From: Wei Zhou zhweisop...@gmail.com Date: Thursday, June 25

sparkR could not find function textFile

2015-06-25 Thread Wei Zhou
Hi all, I am exploring sparkR by activating the shell and following the tutorial here https://amplab-extras.github.io/SparkR-pkg/ And when I tried to read in a local file with textFile(sc, file_location), it gives an error could not find function textFile. By reading through sparkR doc for 1.4,

Re: sparkR could not find function textFile

2015-06-25 Thread Wei Zhou
, Alek [1] -- https://issues.apache.org/jira/browse/SPARK-7230 From: Wei Zhou zhweisop...@gmail.com Date: Thursday, June 25, 2015 at 3:33 PM To: user@spark.apache.org user@spark.apache.org Subject: sparkR could not find function textFile Hi all, I am exploring sparkR by activating

Re: sparkR could not find function textFile

2015-06-25 Thread Wei Zhou
GMT-07:00 Wei Zhou zhweisop...@gmail.com: Hi Alek, Thanks for the explanation, it is very helpful. Cheers, Wei 2015-06-25 13:40 GMT-07:00 Eskilson,Aleksander alek.eskil...@cerner.com: Hi there, The tutorial you’re reading there was written before the merge of SparkR for Spark 1.4.0

Re: sparkR could not find function textFile

2015-06-25 Thread Wei Zhou
not support it in the upcoming releases. So if you can use the DataFrame API for your application you should try that out. Thanks Shivaram On Thu, Jun 25, 2015 at 1:49 PM, Wei Zhou zhweisop...@gmail.com wrote: Hi Alek, Just a follow up question. This is what I did in sparkR shell

Re: sparkR could not find function textFile

2015-06-25 Thread Wei Zhou
From: Wei Zhou zhweisop...@gmail.com Date: Thursday, June 25, 2015 at 4:15 PM To: shiva...@eecs.berkeley.edu shiva...@eecs.berkeley.edu Cc: Aleksander Eskilson alek.eskil...@cerner.com, user@spark.apache.org user@spark.apache.org Subject: Re: sparkR could not find function textFile Thanks

Re: sparkR could not find function textFile

2015-06-25 Thread Wei Zhou
Shivaram On Thu, Jun 25, 2015 at 2:15 PM, Wei Zhou zhweisop...@gmail.com wrote: Thanks to both Shivaram and Alek. Then if I want to create DataFrame from comma separated flat files, what would you recommend me to do? One way I can think of is first reading the data as you would do in r, using

Re: How to Map and Reduce in sparkR

2015-06-25 Thread Wei Zhou
, go ahead and let the dev email list know. Alek [1] -- https://issues.apache.org/jira/browse/SPARK-7230 From: Wei Zhou zhweisop...@gmail.com Date: Wednesday, June 24, 2015 at 4:59 PM To: user@spark.apache.org user@spark.apache.org Subject: How to Map and Reduce in sparkR Anyone knows

Re: How to Map and Reduce in sparkR

2015-06-25 Thread Wei Zhou
the Spark summit talk slides from last week for a bigger picture view http://www.slideshare.net/SparkSummit/07-venkataraman-sun Thanks Shivaram On Thu, Jun 25, 2015 at 3:08 PM, Wei Zhou zhweisop...@gmail.com wrote: Hi Shivaram/Alek, I understand that a better way to import data is to DataFrame

Re: Understanding accumulator during transformations

2015-06-24 Thread Wei Zhou
, Jun 24, 2015 at 1:08 PM, Wei Zhou zhweisop...@gmail.com wrote: Quoting from Spark Program guide: For accumulator updates performed inside *actions only*, Spark guarantees that each task’s update to the accumulator will only be applied once, i.e. restarted tasks will not update the value

How to Map and Reduce in sparkR

2015-06-24 Thread Wei Zhou
Anyone knows whether sparkR supports map and reduce operations as the RDD transformations? Thanks in advance. Best, Wei

Re: Understanding accumulator during transformations

2015-06-24 Thread Wei Zhou
. If you have an accumulator in any of these transformations, then you won't get exactly once semantics, because the transformation may be restarted elsewhere. Bet, Burak On Wed, Jun 24, 2015 at 2:25 PM, Wei Zhou zhweisop...@gmail.com wrote: Hi Burak, Thanks for your quick reply. I guess

Understanding accumulator during transformations

2015-06-24 Thread Wei Zhou
Quoting from Spark Program guide: For accumulator updates performed inside *actions only*, Spark guarantees that each task’s update to the accumulator will only be applied once, i.e. restarted tasks will not update the value. In transformations, users should be aware of that each task’s update

Re: Difference between Lasso regression in MLlib package and ML package

2015-06-23 Thread Wei Zhou
-- Blog: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Fri, Jun 19, 2015 at 11:38 AM, Wei Zhou zhweisop...@gmail.com wrote: Hi Spark experts, I see lasso regression/ elastic net implementation under both MLLib and ML, does anyone know what is the difference between

Re: Difference between Lasso regression in MLlib package and ML package

2015-06-23 Thread Wei Zhou
-- Blog: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Tue, Jun 23, 2015 at 3:14 PM, Wei Zhou zhweisop...@gmail.com wrote: Hi DB Tsai, Thanks for your reply. I went through the source code of LinearRegression.scala. The algorithm

Difference between Lasso regression in MLlib package and ML package

2015-06-19 Thread Wei Zhou
Hi Spark experts, I see lasso regression/ elastic net implementation under both MLLib and ML, does anyone know what is the difference between the two implementation? In spark summit, one of the keynote speakers mentioned that ML is meant for single node computation, could anyone elaborate this?