Re: Behaviour of RDD sampling

2016-05-31 Thread firemonk9
yes, Spark needs to create the RDD first(loads all the data) to create the sample. You can split the files into two sets outside of spark in order to load only the sample set. Thank youDhiraj -- View this message in context:

Re: How big the spark stream window could be ?

2016-05-09 Thread firemonk9
I have not come across official docs in this regard how ever if you use 24 hour window size, you will need to have memory big enough to fit the stream data for 24 hours. Usually memory is the limiting factor for the window size. Dhiraj Peechara -- View this message in context:

Re: What is difference btw reduce & fold?

2015-11-13 Thread firemonk9
Thes is very well explained. Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-is-difference-btw-reduce-fold-tp22653p25376.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: how to get Application ID from Submission ID or Driver ID programmatically

2015-10-02 Thread firemonk9
Have you found how to get the applicationId from submissionId ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-get-Application-ID-from-Submission-ID-or-Driver-ID-programmatically-tp24341p24912.html Sent from the Apache Spark User List mailing list

Re: Connection closed/reset by peers error

2015-07-30 Thread firemonk9
I am having the same issue. Have you found any resolution ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Connection-closed-reset-by-peers-error-tp21459p24081.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Lost task - connection closed

2015-07-30 Thread firemonk9
I am getting same error. Any resolution on this issue ? Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-task-connection-closed-tp21361p24082.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Question regarding spark data partition and coalesce. Need info on my use case.

2015-05-30 Thread firemonk9
I had a similar requirement and I come up with a small algorithem to determine number of partitions based on cluster size and input data. -- View this message in context:

Re: Instantiating/starting Spark jobs programmatically

2015-04-20 Thread firemonk9
I have built a data analytics SaaS platform by creating Rest end points and based on the type of job request I would invoke the necessary spark job/jobs and return the results as json(async). I used yarn-client mode to submit the jobs to yarn cluster. hope this helps. -- View this

Re: Spark Streaming with Kafka

2015-01-20 Thread firemonk9
Hi, I am having similar issues. Have you found any resolution ? Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Kafka-tp21222p21276.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: java.io.IOException: Mkdirs failed to create file:/some/path/myapp.csv while using rdd.saveAsTextFile(fileAddress) Spark

2015-01-09 Thread firemonk9
I am facing same exception in saveAsObjectFile. Have you found any solution ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-Mkdirs-failed-to-create-file-some-path-myapp-csv-while-using-rdd-saveAsTextFile-k-tp20994p21066.html Sent

Re: Failed to save RDD as text file to local file system

2015-01-09 Thread firemonk9
Have you found any resolution for this issue ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21067.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Elastic allocation(spark.dynamicAllocation.enabled) results in task never being executed.

2015-01-03 Thread firemonk9
I am running into similar problem. Have you found any resolution to this issue ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Elastic-allocation-spark-dynamicAllocation-enabled-results-in-task-never-being-executed-tp18969p20957.html Sent from the Apache

Re: trying to understand yarn-client mode

2014-12-29 Thread firemonk9
I am able to fix it by adding the the jars(in the spark distribution) to the classpath. In my sbt file I changed the scope to provided. Let me know if you need more details. -- View this message in context:

Re: Using YARN on a cluster created with spark-ec2

2014-12-27 Thread firemonk9
Currently only standalone cluster is supported with the spark-ec2 script. You can use Cloudera/ambari/sequenceiq for creating yarn cluster. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-YARN-on-a-cluster-created-with-spark-ec2-tp20816p20870.html

RDD saveAsObjectFile write to local file and HDFS

2014-11-26 Thread firemonk9
When I am running spark locally, RDD saveAsObjectFile writes the file to local file system (ex : path /data/temp.txt) and when I am running spark on YARN cluster, RDD saveAsObjectFile writes the file to hdfs. (ex : path /data/temp.txt ) Is there a way to explictly mention local file system

Spark yarn cluster Application Master not running yarn container

2014-11-25 Thread firemonk9
I am running a 3 node(32 core, 60gb) Yarn cluster for Spark jobs. 1) Below are my Yarn memory settings yarn.nodemanager.resource.memory-mb = 52224 yarn.scheduler.minimum-allocation-mb = 40960 yarn.scheduler.maximum-allocation-mb = 52224 Apache Spark Memory Settings export

Re: Spark 1.0.0 on yarn cluster problem

2014-10-23 Thread firemonk9
Hi, I am facing same problem. My spark-env.sh has below entries yet I see the yarn container with only 1G and yarn only spawns two workers. SPARK_EXECUTOR_CORES=1 SPARK_EXECUTOR_MEMORY=3G SPARK_EXECUTOR_INSTANCES=5 Please let me know if you are able to resolve this issue. Thank you --