Re: Weird worker usage

2015-09-28 Thread Bryan Jeffrey
Nukunj, No, I'm not calling set w/ master at all. This ended up being a foolish configuration problem with my slaves file. Regards, Bryan Jeffrey On Fri, Sep 25, 2015 at 11:20 PM, N B wrote: > Bryan, > > By any chance, are you calling SparkConf.setMaster("local[*]")

Re: Weird worker usage

2015-09-26 Thread N B
Hello, Does anyone have an insight into what could be the issue here? Thanks Nikunj On Fri, Sep 25, 2015 at 10:44 AM, N B wrote: > Hi Akhil, > > I do have 25 partitions being created. I have set > the spark.default.parallelism property to 25. Batch size is 30 seconds and

Re: Weird worker usage

2015-09-26 Thread Akhil Das
That means only a single receiver is doing all the work and hence the data is local to your N1 machine and hence all tasks are executed there. Now to get the data to N2, you need to do either a .repartition or set the StorageLevel MEMORY*_2 where _2 enables the data replication and i guess that

Re: Weird worker usage

2015-09-26 Thread Akhil Das
That means only Thanks Best Regards On Sun, Sep 27, 2015 at 12:07 AM, N B wrote: > Hello, > > Does anyone have an insight into what could be the issue here? > > Thanks > Nikunj > > > On Fri, Sep 25, 2015 at 10:44 AM, N B wrote: > >> Hi Akhil, >> >> I

Re: Weird worker usage

2015-09-25 Thread Bryan Jeffrey
I am seeing a similar issue when reading from Kafka. I have a single Kafka broker with 1 topic and 10 partitions on a separate machine. I have a three-node spark cluster, and verified that all workers are registered with the master. I'm initializing Kafka using a similar method to this article:

Re: Weird worker usage

2015-09-25 Thread N B
Hi Akhil, I do have 25 partitions being created. I have set the spark.default.parallelism property to 25. Batch size is 30 seconds and block interval is 1200 ms which also gives us roughly 25 partitions from the input stream. I can see 25 partitions being created and used in the Spark UI also.

Re: Weird worker usage

2015-09-25 Thread Bryan Jeffrey
Looking at this further, it appears that my Spark Context is not correctly setting the Master name. I see the following in logs: 15/09/25 16:45:42 INFO DriverRunner: Launch Command: "/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java" "-cp"

Weird worker usage

2015-09-25 Thread N B
Hello all, I have a Spark streaming application that reads from a Flume Stream, does quite a few maps/filters in addition to a few reduceByKeyAndWindow and join operations before writing the analyzed output to ElasticSearch inside a foreachRDD()... I recently started to run this on a 2 node

Re: Weird worker usage

2015-09-25 Thread Akhil Das
Parallel tasks totally depends on the # of partitions that you are having, if you are not receiving sufficient partitions (partitions > total # cores) then try to do a .repartition. Thanks Best Regards On Fri, Sep 25, 2015 at 1:44 PM, N B wrote: > Hello all, > > I have a