date:20160107

Re: Spark job uses only one Worker

2016-01-07 Thread Annabel Melongo

Michael,
I don't know what's your environment but if it's Cloudera, you should be able
to see the link to your master in the Hue.
Thanks

On Thursday, January 7, 2016 5:03 PM, Michael Pisula
wrote:

I had tried several parameters, including --total-executor-cores, no effect.
As for the port, I tried 7077, but if I remember correctly I got some kind of
error that suggested to try 6066, with which it worked just fine (apart from
this issue here).

Each worker has two cores. I also tried increasing cores, again no effect. I
was able to increase the number of cores the job was using on one worker, but
it would not use any other worker (and it would not start if the number of
cores the job wanted was higher than the number available on one worker).

On 07.01.2016 22:51, Igor Berman wrote:

read about --total-executor-cores not sure why you specify port 6066 in
master...usually it's 7077
verify in master ui(usually port 8080) how many cores are there(depends on
other configs, but usually workers connect to master with all their cores)
On 7 January 2016 at 23:46, Michael Pisula wrote:

Hi,

I start the cluster using the spark-ec2 scripts, so the cluster is in
stand-alone mode.
Here is how I submit my job:
spark/bin/spark-submit --class demo.spark.StaticDataAnalysis --master
spark://:6066 --deploy-mode cluster demo/Demo-1.0-SNAPSHOT-all.jar

Cheers,
Michael

On 07.01.2016 22:41, Igor Berman wrote:

share how you submit your job what cluster(yarn, standalone)
On 7 January 2016 at 23:24, Michael Pisula wrote:

Hi there,

I ran a simple Batch Application on a Spark Cluster on EC2. Despite having 3
Worker Nodes, I could not get the application processed on more than one
node, regardless if I submitted the Application in Cluster or Client mode.
I also tried manually increasing the number of partitions in the code, no
effect. I also pass the master into the application.
I verified on the nodes themselves that only one node was active while the
job was running.
I pass enough data to make the job take 6 minutes to process.
The job is simple enough, reading data from two S3 files, joining records on
a shared field, filtering out some records and writing the result back to
S3.

Tried all kinds of stuff, but could not make it work. I did find similar
questions, but had already tried the solutions that worked in those cases.
Would be really happy about any pointers.

Cheers,
Michael

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-uses-only-one-Worker-tp25909.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

--
Michael Pisula * michael.pis...@tngtech.com * +49-174-3180084
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

55 matches

Mail list logo