Re: using R with Spark

2017-09-24 Thread Felix Cheung
There are other approaches like this Find Livy on the page https://blog.rstudio.com/2017/01/24/sparklyr-0-5/ Probably will be best to follow up with sparklyr for any support question. From: Adaryl Wakefield Sent: Sunday, September

RE: using R with Spark

2017-09-24 Thread Adaryl Wakefield
>It is free for use might need r studio server depending on which spark master >you choose. Yeah I think that’s where my confusion is coming from. I’m looking at a cheat sheet. For connecting to a Yarn Cluster the first step is; 1. Install RStudio Server or RStudio Pro on one of the existing

Re: using R with Spark

2017-09-24 Thread Jules Damji
You can also you sparkly on Databricks. https://databricks.com/blog/2017/05/25/using-sparklyr-databricks.html Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Sep 24, 2017, at 1:24 PM, Felix Cheung wrote: > > Both are free to use; you can use

Re: using R with Spark

2017-09-24 Thread Felix Cheung
If you google it you will find posts or info on how to connect it to different cloud and hadoop/spark vendors. From: Georg Heiler Sent: Sunday, September 24, 2017 1:39:09 PM To: Felix Cheung; Adaryl Wakefield; user@spark.apache.org

Re: using R with Spark

2017-09-24 Thread Georg Heiler
No. It is free for use might need r studio server depending on which spark master you choose. Felix Cheung schrieb am So. 24. Sep. 2017 um 22:24: > Both are free to use; you can use sparklyr from the R shell without > RStudio (but you probably want an IDE) > >

Re: using R with Spark

2017-09-24 Thread Felix Cheung
Both are free to use; you can use sparklyr from the R shell without RStudio (but you probably want an IDE) From: Adaryl Wakefield Sent: Sunday, September 24, 2017 11:19:24 AM To: user@spark.apache.org Subject: using R with Spark

using R with Spark

2017-09-24 Thread Adaryl Wakefield
There are two packages SparkR and sparklyr. Sparklyr seems to be the more useful. However, do you have to pay to use it? Unless I'm not reading this right, it seems you have to have the paid version of RStudio to use it. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC

pyspark dataframe partitionBy write to parquet fies

2017-09-24 Thread wings
hello all, Im trying to write parquet files using df.write.partitionBy('type').mode('overwrite').parquet(path),when df is partitioned by column `type` it may skrewed.Some dir may take 100GB hdfs space or more,but some of them only take 1GB.But all sub dir have the same files nums,let's say