Re: Can not set spark dynamic resource allocation

2016-05-27 Thread Cui, Weifeng
Sorry to reply this late.


yarn.nodemanager.log-dirs
/local/output/logs/nm-log-dir
  

We do not use file://  in the settings, so that should not be the problem. Any 
other guesses?

Weifeng



On 5/20/16, 2:40 PM, "David Newberger" <david.newber...@wandcorp.com> wrote:

>Hi All,
>
>The error you are seeing looks really similar to Spark-13514 to me. I could be 
>wrong though
>
>https://issues.apache.org/jira/browse/SPARK-13514
>
>Can you check yarn.nodemanager.local-dirs  in your YARN configuration for 
>"file://"
>
>
>Cheers!
>David Newberger
>
>-Original Message-
>From: Cui, Weifeng [mailto:weife...@a9.com] 
>Sent: Friday, May 20, 2016 4:26 PM
>To: Marcelo Vanzin
>Cc: Ted Yu; Rodrick Brown; user; Zhao, Jun; Aulakh, Sahib; Song, Yiwei
>Subject: Re: Can not set spark dynamic resource allocation
>
>Sorry, here is the node-manager log. application_1463692924309_0002 is my 
>test. Hope this will help.
>http://pastebin.com/0BPEcgcW
>
>
>
>On 5/20/16, 2:09 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:
>
>>Hi Weifeng,
>>
>>That's the Spark event log, not the YARN application log. You get the 
>>latter using the "yarn logs" command.
>>
>>On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote:
>>> Here is the application log for this spark job.
>>>
>>> http://pastebin.com/2UJS9L4e
>>>
>>>
>>>
>>> Thanks,
>>> Weifeng
>>>
>>>
>>>
>>>
>>>
>>> From: "Aulakh, Sahib" <aula...@a9.com>
>>> Date: Friday, May 20, 2016 at 12:43 PM
>>> To: Ted Yu <yuzhih...@gmail.com>
>>> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng 
>>> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun"
>>> <junz...@a9.com>
>>> Subject: Re: Can not set spark dynamic resource allocation
>>>
>>>
>>>
>>> Yes it is yarn. We have configured spark shuffle service w yarn node 
>>> manager but something must be off.
>>>
>>>
>>>
>>> We will send u app log on paste bin.
>>>
>>> Sent from my iPhone
>>>
>>>
>>> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>> Since yarn-site.xml was cited, I assume the cluster runs YARN.
>>>
>>>
>>>
>>> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown 
>>> <rodr...@orchardplatform.com> wrote:
>>>
>>> Is this Yarn or Mesos? For the later you need to start an external 
>>> shuffle service.
>>>
>>> Get Outlook for iOS
>>>
>>>
>>>
>>>
>>>
>>> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" 
>>> <weife...@a9.com>
>>> wrote:
>>>
>>> Hi guys,
>>>
>>>
>>>
>>> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set 
>>> dynamic resource allocation for spark and we followed the following 
>>> link. After the changes, all spark jobs failed.
>>>
>>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-reso
>>> urce-allocation
>>>
>>> This test was on a test cluster which has 1 master machine (running 
>>> namenode, resourcemanager and hive server), 1 worker machine (running 
>>> datanode and nodemanager) and 1 machine as client( running spark shell).
>>>
>>>
>>>
>>> What I updated in config :
>>>
>>>
>>>
>>> 1. Update in spark-defaults.conf
>>>
>>> spark.dynamicAllocation.enabled true
>>>
>>> spark.shuffle.service.enabledtrue
>>>
>>>
>>>
>>> 2. Update yarn-site.xml
>>>
>>> 
>>>
>>>  yarn.nodemanager.aux-services
>>>   mapreduce_shuffle,spark_shuffle
>>> 
>>>
>>> 
>>> yarn.nodemanager.aux-services.spark_shuffle.class
>>> org.apache.spark.network.yarn.YarnShuffleService
>>> 
>>>
>>> 
>>> spark.shuffle.service.enabled
>>>  true
>>> 
>>>
>>> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
>>> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>>>
>>> 4. Restart nam

RE: Can not set spark dynamic resource allocation

2016-05-20 Thread David Newberger
Hi All,

The error you are seeing looks really similar to Spark-13514 to me. I could be 
wrong though

https://issues.apache.org/jira/browse/SPARK-13514

Can you check yarn.nodemanager.local-dirs  in your YARN configuration for 
"file://"


Cheers!
David Newberger

-Original Message-
From: Cui, Weifeng [mailto:weife...@a9.com] 
Sent: Friday, May 20, 2016 4:26 PM
To: Marcelo Vanzin
Cc: Ted Yu; Rodrick Brown; user; Zhao, Jun; Aulakh, Sahib; Song, Yiwei
Subject: Re: Can not set spark dynamic resource allocation

Sorry, here is the node-manager log. application_1463692924309_0002 is my test. 
Hope this will help.
http://pastebin.com/0BPEcgcW



On 5/20/16, 2:09 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:

>Hi Weifeng,
>
>That's the Spark event log, not the YARN application log. You get the 
>latter using the "yarn logs" command.
>
>On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote:
>> Here is the application log for this spark job.
>>
>> http://pastebin.com/2UJS9L4e
>>
>>
>>
>> Thanks,
>> Weifeng
>>
>>
>>
>>
>>
>> From: "Aulakh, Sahib" <aula...@a9.com>
>> Date: Friday, May 20, 2016 at 12:43 PM
>> To: Ted Yu <yuzhih...@gmail.com>
>> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng 
>> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun"
>> <junz...@a9.com>
>> Subject: Re: Can not set spark dynamic resource allocation
>>
>>
>>
>> Yes it is yarn. We have configured spark shuffle service w yarn node 
>> manager but something must be off.
>>
>>
>>
>> We will send u app log on paste bin.
>>
>> Sent from my iPhone
>>
>>
>> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> Since yarn-site.xml was cited, I assume the cluster runs YARN.
>>
>>
>>
>> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown 
>> <rodr...@orchardplatform.com> wrote:
>>
>> Is this Yarn or Mesos? For the later you need to start an external 
>> shuffle service.
>>
>> Get Outlook for iOS
>>
>>
>>
>>
>>
>> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" 
>> <weife...@a9.com>
>> wrote:
>>
>> Hi guys,
>>
>>
>>
>> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set 
>> dynamic resource allocation for spark and we followed the following 
>> link. After the changes, all spark jobs failed.
>>
>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-reso
>> urce-allocation
>>
>> This test was on a test cluster which has 1 master machine (running 
>> namenode, resourcemanager and hive server), 1 worker machine (running 
>> datanode and nodemanager) and 1 machine as client( running spark shell).
>>
>>
>>
>> What I updated in config :
>>
>>
>>
>> 1. Update in spark-defaults.conf
>>
>> spark.dynamicAllocation.enabled true
>>
>> spark.shuffle.service.enabledtrue
>>
>>
>>
>> 2. Update yarn-site.xml
>>
>> 
>>
>>  yarn.nodemanager.aux-services
>>   mapreduce_shuffle,spark_shuffle
>> 
>>
>> 
>> yarn.nodemanager.aux-services.spark_shuffle.class
>> org.apache.spark.network.yarn.YarnShuffleService
>> 
>>
>> 
>> spark.shuffle.service.enabled
>>  true
>> 
>>
>> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
>> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>>
>> 4. Restart namenode, datanode, resourcemanager, nodemanger... retart 
>> everything
>>
>> 5. The config will update in all machines, resourcemanager and nodemanager.
>> We update the config in one place and copy to all machines.
>>
>>
>>
>> What I tested:
>>
>>
>>
>> 1. I started a scala spark shell and check its environment variables, 
>> spark.dynamicAllocation.enabled is true.
>>
>> 2. I used the following code:
>>
>> scala > val line =
>> sc.textFile("/spark-events/application_1463681113470_0006")
>>
>> line: org.apache.spark.rdd.RDD[String] =
>> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at 
>> textFile at :27
>>
>> scala > line.count # This command j

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Cui, Weifeng
Sorry, here is the node-manager log. application_1463692924309_0002 is my test. 
Hope this will help.
http://pastebin.com/0BPEcgcW



On 5/20/16, 2:09 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:

>Hi Weifeng,
>
>That's the Spark event log, not the YARN application log. You get the
>latter using the "yarn logs" command.
>
>On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote:
>> Here is the application log for this spark job.
>>
>> http://pastebin.com/2UJS9L4e
>>
>>
>>
>> Thanks,
>> Weifeng
>>
>>
>>
>>
>>
>> From: "Aulakh, Sahib" <aula...@a9.com>
>> Date: Friday, May 20, 2016 at 12:43 PM
>> To: Ted Yu <yuzhih...@gmail.com>
>> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng
>> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun"
>> <junz...@a9.com>
>> Subject: Re: Can not set spark dynamic resource allocation
>>
>>
>>
>> Yes it is yarn. We have configured spark shuffle service w yarn node manager
>> but something must be off.
>>
>>
>>
>> We will send u app log on paste bin.
>>
>> Sent from my iPhone
>>
>>
>> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> Since yarn-site.xml was cited, I assume the cluster runs YARN.
>>
>>
>>
>> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown
>> <rodr...@orchardplatform.com> wrote:
>>
>> Is this Yarn or Mesos? For the later you need to start an external shuffle
>> service.
>>
>> Get Outlook for iOS
>>
>>
>>
>>
>>
>> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" <weife...@a9.com>
>> wrote:
>>
>> Hi guys,
>>
>>
>>
>> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic
>> resource allocation for spark and we followed the following link. After the
>> changes, all spark jobs failed.
>>
>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>>
>> This test was on a test cluster which has 1 master machine (running
>> namenode, resourcemanager and hive server), 1 worker machine (running
>> datanode and nodemanager) and 1 machine as client( running spark shell).
>>
>>
>>
>> What I updated in config :
>>
>>
>>
>> 1. Update in spark-defaults.conf
>>
>> spark.dynamicAllocation.enabled true
>>
>> spark.shuffle.service.enabledtrue
>>
>>
>>
>> 2. Update yarn-site.xml
>>
>> 
>>
>>  yarn.nodemanager.aux-services
>>   mapreduce_shuffle,spark_shuffle
>> 
>>
>> 
>> yarn.nodemanager.aux-services.spark_shuffle.class
>> org.apache.spark.network.yarn.YarnShuffleService
>> 
>>
>> 
>> spark.shuffle.service.enabled
>>  true
>> 
>>
>> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
>> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>>
>> 4. Restart namenode, datanode, resourcemanager, nodemanger... retart
>> everything
>>
>> 5. The config will update in all machines, resourcemanager and nodemanager.
>> We update the config in one place and copy to all machines.
>>
>>
>>
>> What I tested:
>>
>>
>>
>> 1. I started a scala spark shell and check its environment variables,
>> spark.dynamicAllocation.enabled is true.
>>
>> 2. I used the following code:
>>
>> scala > val line =
>> sc.textFile("/spark-events/application_1463681113470_0006")
>>
>> line: org.apache.spark.rdd.RDD[String] =
>> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile
>> at :27
>>
>> scala > line.count # This command just stuck here
>>
>>
>>
>> 3. In the beginning, there is only 1 executor(this is for driver) and after
>> line.count, I could see 3 executors, then dropped to 1.
>>
>> 4. Several jobs were launched and all of them failed.   Tasks (for all
>> stages): Succeeded/Total : 0/2 (4 failed)
>>
>>
>>
>> Error messages:
>>
>>
>>
>> I found the following messages in spark web UI. I found this in spark.log on
>> nodemanager machine as

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Marcelo Vanzin
Hi Weifeng,

That's the Spark event log, not the YARN application log. You get the
latter using the "yarn logs" command.

On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote:
> Here is the application log for this spark job.
>
> http://pastebin.com/2UJS9L4e
>
>
>
> Thanks,
> Weifeng
>
>
>
>
>
> From: "Aulakh, Sahib" <aula...@a9.com>
> Date: Friday, May 20, 2016 at 12:43 PM
> To: Ted Yu <yuzhih...@gmail.com>
> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng
> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun"
> <junz...@a9.com>
> Subject: Re: Can not set spark dynamic resource allocation
>
>
>
> Yes it is yarn. We have configured spark shuffle service w yarn node manager
> but something must be off.
>
>
>
> We will send u app log on paste bin.
>
> Sent from my iPhone
>
>
> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> Since yarn-site.xml was cited, I assume the cluster runs YARN.
>
>
>
> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown
> <rodr...@orchardplatform.com> wrote:
>
> Is this Yarn or Mesos? For the later you need to start an external shuffle
> service.
>
> Get Outlook for iOS
>
>
>
>
>
> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" <weife...@a9.com>
> wrote:
>
> Hi guys,
>
>
>
> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic
> resource allocation for spark and we followed the following link. After the
> changes, all spark jobs failed.
>
> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>
> This test was on a test cluster which has 1 master machine (running
> namenode, resourcemanager and hive server), 1 worker machine (running
> datanode and nodemanager) and 1 machine as client( running spark shell).
>
>
>
> What I updated in config :
>
>
>
> 1. Update in spark-defaults.conf
>
> spark.dynamicAllocation.enabled true
>
> spark.shuffle.service.enabledtrue
>
>
>
> 2. Update yarn-site.xml
>
> 
>
>  yarn.nodemanager.aux-services
>   mapreduce_shuffle,spark_shuffle
> 
>
> 
> yarn.nodemanager.aux-services.spark_shuffle.class
> org.apache.spark.network.yarn.YarnShuffleService
> 
>
> 
> spark.shuffle.service.enabled
>  true
> 
>
> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>
> 4. Restart namenode, datanode, resourcemanager, nodemanger... retart
> everything
>
> 5. The config will update in all machines, resourcemanager and nodemanager.
> We update the config in one place and copy to all machines.
>
>
>
> What I tested:
>
>
>
> 1. I started a scala spark shell and check its environment variables,
> spark.dynamicAllocation.enabled is true.
>
> 2. I used the following code:
>
> scala > val line =
> sc.textFile("/spark-events/application_1463681113470_0006")
>
> line: org.apache.spark.rdd.RDD[String] =
> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile
> at :27
>
> scala > line.count # This command just stuck here
>
>
>
> 3. In the beginning, there is only 1 executor(this is for driver) and after
> line.count, I could see 3 executors, then dropped to 1.
>
> 4. Several jobs were launched and all of them failed.   Tasks (for all
> stages): Succeeded/Total : 0/2 (4 failed)
>
>
>
> Error messages:
>
>
>
> I found the following messages in spark web UI. I found this in spark.log on
> nodemanager machine as well.
>
>
>
> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
> Reason: Container marked as failed: container_1463692924309_0002_01_02
> on host: xxx.com. Exit status: 1. Diagnostics: Exception from
> container-launch.
> Container id: container_1463692924309_0002_01_02
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(Contain

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Cui, Weifeng
Here is the application log for this spark job.
http://pastebin.com/2UJS9L4e

Thanks,
Weifeng


From: "Aulakh, Sahib" <aula...@a9.com>
Date: Friday, May 20, 2016 at 12:43 PM
To: Ted Yu <yuzhih...@gmail.com>
Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng <weife...@a9.com>, 
user <user@spark.apache.org>, "Zhao, Jun" <junz...@a9.com>
Subject: Re: Can not set spark dynamic resource allocation

Yes it is yarn. We have configured spark shuffle service w yarn node manager 
but something must be off.

We will send u app log on paste bin.

Sent from my iPhone

On May 20, 2016, at 12:35 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:
Since yarn-site.xml was cited, I assume the cluster runs YARN.

On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown 
<rodr...@orchardplatform.com<mailto:rodr...@orchardplatform.com>> wrote:
Is this Yarn or Mesos? For the later you need to start an external shuffle 
service.
Get Outlook for iOS<https://aka.ms/o0ukef>



On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" 
<weife...@a9.com<mailto:weife...@a9.com>> wrote:

Hi guys,



Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic 
resource allocation for spark and we followed the following link. After the 
changes, all spark jobs failed.
https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation

This test was on a test cluster which has 1 master machine (running namenode, 
resourcemanager and hive server), 1 worker machine (running datanode and 
nodemanager) and 1 machine as client( running spark shell).



What I updated in config :



1. Update in spark-defaults.conf

spark.dynamicAllocation.enabled true
spark.shuffle.service.enabledtrue



2. Update yarn-site.xml


 yarn.nodemanager.aux-services
  mapreduce_shuffle,spark_shuffle



yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService



spark.shuffle.service.enabled
 true


3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath 
($HADOOP_HOME/share/hadoop/yarn/*) in python code

4. Restart namenode, datanode, resourcemanager, nodemanger... retart everything

5. The config will update in all machines, resourcemanager and nodemanager. We 
update the config in one place and copy to all machines.



What I tested:



1. I started a scala spark shell and check its environment variables, 
spark.dynamicAllocation.enabled is true.

2. I used the following code:

scala > val line = 
sc.textFile("/spark-events/application_1463681113470_0006")

line: org.apache.spark.rdd.RDD[String] = 
/spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile at 
:27

scala > line.count # This command just stuck here



3. In the beginning, there is only 1 executor(this is for driver) and after 
line.count, I could see 3 executors, then dropped to 1.

4. Several jobs were launched and all of them failed.   Tasks (for all stages): 
Succeeded/Total : 0/2 (4 failed)



Error messages:



I found the following messages in spark web UI. I found this in spark.log on 
nodemanager machine as well.


ExecutorLostFailure (executor 1 exited caused by one of the running tasks) 
Reason: Container marked as failed: container_1463692924309_0002_01_02 on 
host: xxx.com<http://xxx.com>. Exit status: 1. 
Diagnostics: Exception from container-launch.
Container id: container_1463692924309_0002_01_02
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1



Thanks a lot for help. We can provide more information if needed.



Thanks,
Weifeng











NOTICE TO RECIPIENTS: This communication is confidential and intended for the 
use of the addressee only. If you are not an intended recipient of this 
communication, please delete it immediately and notify the sender by return 

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Aulakh, Sahib
Yes it is yarn. We have configured spark shuffle service w yarn node manager 
but something must be off.

We will send u app log on paste bin.

Sent from my iPhone

On May 20, 2016, at 12:35 PM, Ted Yu 
> wrote:

Since yarn-site.xml was cited, I assume the cluster runs YARN.

On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown 
> wrote:
Is this Yarn or Mesos? For the later you need to start an external shuffle 
service.

Get Outlook for iOS




On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" 
> wrote:


Hi guys,



Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic 
resource allocation for spark and we followed the following link. After the 
changes, all spark jobs failed.
https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation

This test was on a test cluster which has 1 master machine (running namenode, 
resourcemanager and hive server), 1 worker machine (running datanode and 
nodemanager) and 1 machine as client( running spark shell).



What I updated in config :



1. Update in spark-defaults.conf

spark.dynamicAllocation.enabled true
spark.shuffle.service.enabledtrue



2. Update yarn-site.xml


 yarn.nodemanager.aux-services
  mapreduce_shuffle,spark_shuffle



yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService



spark.shuffle.service.enabled
 true


3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath 
($HADOOP_HOME/share/hadoop/yarn/*) in python code

4. Restart namenode, datanode, resourcemanager, nodemanger... retart everything

5. The config will update in all machines, resourcemanager and nodemanager. We 
update the config in one place and copy to all machines.



What I tested:



1. I started a scala spark shell and check its environment variables, 
spark.dynamicAllocation.enabled is true.

2. I used the following code:

scala > val line = 
sc.textFile("/spark-events/application_1463681113470_0006")

line: org.apache.spark.rdd.RDD[String] = 
/spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile at 
:27

scala > line.count # This command just stuck here



3. In the beginning, there is only 1 executor(this is for driver) and after 
line.count, I could see 3 executors, then dropped to 1.

4. Several jobs were launched and all of them failed.   Tasks (for all stages): 
Succeeded/Total : 0/2 (4 failed)



Error messages:



I found the following messages in spark web UI. I found this in spark.log on 
nodemanager machine as well.


ExecutorLostFailure (executor 1 exited caused by one of the running tasks) 
Reason: Container marked as failed: container_1463692924309_0002_01_02 on 
host: xxx.com. Exit status: 1. 
Diagnostics: Exception from container-launch.
Container id: container_1463692924309_0002_01_02
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1



Thanks a lot for help. We can provide more information if needed.



Thanks,
Weifeng











NOTICE TO RECIPIENTS: This communication is confidential and intended for the 
use of the addressee only. If you are not an intended recipient of this 
communication, please delete it immediately and notify the sender by return 
email. Unauthorized reading, dissemination, distribution or copying of this 
communication is prohibited. This communication does not constitute an offer to 
sell or a solicitation of an indication of interest to purchase any loan, 
security or any other financial product or instrument, nor is it an offer to 
sell or a solicitation of an indication of interest to purchase any products or 
services to any persons who are prohibited from receiving such information 
under applicable law. The contents of this communication may not 

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Ted Yu
Since yarn-site.xml was cited, I assume the cluster runs YARN.

On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown  wrote:

> Is this Yarn or Mesos? For the later you need to start an external shuffle
> service.
>
> Get Outlook for iOS 
>
>
>
>
> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" 
> wrote:
>
> Hi guys,
>>
>>
>>
>> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set
>> dynamic resource allocation for spark and we followed the following link.
>> After the changes, all spark jobs failed.
>>
>>
>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>>
>> This test was on a test cluster which has 1 master machine (running
>> namenode, resourcemanager and hive server), 1 worker machine (running
>> datanode and nodemanager) and 1 machine as client( running spark shell).
>>
>>
>>
>> *What I updated in config :*
>>
>>
>>
>> 1. Update in spark-defaults.conf
>>
>> spark.dynamicAllocation.enabled true
>>
>> spark.shuffle.service.enabledtrue
>>
>>
>>
>> 2. Update yarn-site.xml
>>
>> 
>>
>>  yarn.nodemanager.aux-services
>>   mapreduce_shuffle,*spark_shuffle*
>> 
>>
>> 
>> yarn.nodemanager.aux-services.spark_shuffle.class
>>
>> org.apache.spark.network.yarn.YarnShuffleService
>> 
>>
>> 
>> spark.shuffle.service.enabled
>>  true
>> 
>>
>> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
>> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>>
>> 4. Restart namenode, datanode, resourcemanager, nodemanger...
>> retart everything
>>
>> 5. The config will update in all machines, resourcemanager
>> and nodemanager. We update the config in one place and copy to all machines.
>>
>>
>>
>> *What I tested:*
>>
>>
>>
>> 1. I started a scala spark shell and check its environment variables,
>> spark.dynamicAllocation.enabled is true.
>>
>> 2. I used the following code:
>>
>> scala > val line =
>> sc.textFile("/spark-events/application_1463681113470_0006")
>>
>> line: org.apache.spark.rdd.RDD[String] =
>> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at
>> textFile at :27
>>
>> scala > line.count # This command just stuck here
>>
>>
>>
>> 3. In the beginning, there is only 1 executor(this is for driver) and
>> after line.count, I could see 3 executors, then dropped to 1.
>>
>> 4. Several jobs were launched and all of them failed.   Tasks (for all
>> stages): Succeeded/Total : 0/2 (4 failed)
>>
>>
>>
>> *Error messages:*
>>
>>
>>
>> I found the following messages in spark web UI. I found this in spark.log
>> on nodemanager machine as well.
>>
>>
>>
>> *ExecutorLostFailure (executor 1 exited caused by one of the running
>> tasks) Reason: Container marked as failed:
>> container_1463692924309_0002_01_02 on host: xxx.com
>> . Exit status: 1. Diagnostics: Exception from
>> container-launch.*
>> *Container id: container_1463692924309_0002_01_02*
>> *Exit code: 1*
>> *Stack trace: ExitCodeException exitCode=1: *
>> *at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)*
>> *at org.apache.hadoop.util.Shell.run(Shell.java:455)*
>> *at
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)*
>> *at
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)*
>> *at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)*
>> *at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)*
>> *at java.util.concurrent.FutureTask.run(FutureTask.java:266)*
>> *at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
>> *at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
>> *at java.lang.Thread.run(Thread.java:745)*
>>
>> *Container exited with a non-zero exit code 1*
>>
>>
>>
>> Thanks a lot for help. We can provide more information if needed.
>>
>>
>>
>> Thanks,
>> Weifeng
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> *NOTICE TO RECIPIENTS*: This communication is confidential and intended
> for the use of the addressee only. If you are not an intended recipient of
> this communication, please delete it immediately and notify the sender by
> return email. Unauthorized reading, dissemination, distribution or copying
> of this communication is prohibited. This communication does not constitute
> an offer to sell or a solicitation of an indication of interest to purchase
> any loan, security or any other financial product or instrument, nor is it
> an offer to sell or a solicitation of an indication of interest to purchase
> any products or services to any persons who are prohibited from 

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Rodrick Brown
Is this Yarn or Mesos? For the later you need to start an external shuffle 
service. 

Get Outlook for iOS




On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng"  wrote:






















Hi guys,


 


Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic 
resource allocation for spark and we followed the following link. After the 
changes, all spark jobs failed.



https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation


This test was on a test cluster which has 1 master machine (running namenode, 
resourcemanager and hive server), 1 worker machine (running datanode and 
nodemanager) and 1 machine as client(
 running spark shell).


 


What I updated in config :


 


1. Update in spark-defaults.conf


spark.dynamicAllocation.enabled true


spark.shuffle.service.enabled    true



 


2. Update yarn-site.xml





 yarn.nodemanager.aux-services

  mapreduce_shuffle,spark_shuffle







yarn.nodemanager.aux-services.spark_shuffle.class

org.apache.spark.network.yarn.YarnShuffleService







spark.shuffle.service.enabled

 true

 


3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath 
($HADOOP_HOME/share/hadoop/yarn/*) in python code


4. Restart namenode, datanode, resourcemanager, nodemanger... retart everything


5. The config will update in all machines, resourcemanager and nodemanager. We 
update the config in one place and copy to all machines.


 


What I tested:


 


1. I started a scala spark shell and check its environment variables, 
spark.dynamicAllocation.enabled is true.


2. I used the following code:


scala > val line = 
sc.textFile("/spark-events/application_1463681113470_0006")


                    line: org.apache.spark.rdd.RDD[String] = 
/spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile at 
:27


    scala > line.count # This command just stuck here





3. In the beginning, there is only 1 executor(this is for driver) and after 
line.count, I could see 3 executors, then dropped to 1.



4. Several jobs were launched and all of them failed.   Tasks (for all stages): 
Succeeded/Total : 0/2 (4 failed)  


 


Error messages:


 


I found the following messages in spark web UI. I found this in spark.log on 
nodemanager machine as well.


 


ExecutorLostFailure (executor 1 exited caused by one of the running tasks) 
Reason: Container marked as failed: container_1463692924309_0002_01_02 on 
host:
 xxx.com. Exit status: 1. Diagnostics: Exception from 
container-launch.

Container id: container_1463692924309_0002_01_02

Exit code: 1

Stack trace: ExitCodeException exitCode=1: 


at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)

at org.apache.hadoop.util.Shell.run(Shell.java:455)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)

at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)



Container exited with a non-zero exit code 1



 


Thanks a lot for help. We can provide more information if needed.


 


Thanks,

Weifeng


 


 


 


 


 









-- 
*NOTICE TO RECIPIENTS*: This communication is confidential and intended for 
the use of the addressee only. If you are not an intended recipient of this 
communication, please delete it immediately and notify the sender by return 
email. Unauthorized reading, dissemination, distribution or copying of this 
communication is prohibited. This communication does not constitute an 
offer to sell or a solicitation of an indication of interest to purchase 
any loan, security or any other financial product or instrument, nor is it 
an offer to sell or a solicitation of an indication of interest to purchase 
any products or services to any persons who are prohibited from receiving 
such information under applicable law. The contents of this communication 
may not be accurate or complete and are subject to change without notice. 
As such, Orchard App, Inc. (including its subsidiaries and affiliates, 
"Orchard") makes no representation regarding the accuracy or completeness 
of the information contained herein. The intended recipient is advised to 
consult its own professional advisors, including those specializing in 
legal, 

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Ted Yu
Can you retrieve the log for application_1463681113470_0006 and pastebin it
?

Thanks

On Fri, May 20, 2016 at 11:48 AM, Cui, Weifeng  wrote:

> Hi guys,
>
>
>
> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set
> dynamic resource allocation for spark and we followed the following link.
> After the changes, all spark jobs failed.
>
>
> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>
> This test was on a test cluster which has 1 master machine (running
> namenode, resourcemanager and hive server), 1 worker machine (running
> datanode and nodemanager) and 1 machine as client( running spark shell).
>
>
>
> *What I updated in config :*
>
>
>
> 1. Update in spark-defaults.conf
>
> spark.dynamicAllocation.enabled true
>
> spark.shuffle.service.enabledtrue
>
>
>
> 2. Update yarn-site.xml
>
> 
>
>  yarn.nodemanager.aux-services
>   mapreduce_shuffle,*spark_shuffle*
> 
>
> 
> yarn.nodemanager.aux-services.spark_shuffle.class
> org.apache.spark.network.yarn.YarnShuffleService
> 
>
> 
> spark.shuffle.service.enabled
>  true
> 
>
> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>
> 4. Restart namenode, datanode, resourcemanager, nodemanger...
> retart everything
>
> 5. The config will update in all machines, resourcemanager
> and nodemanager. We update the config in one place and copy to all machines.
>
>
>
> *What I tested:*
>
>
>
> 1. I started a scala spark shell and check its environment variables,
> spark.dynamicAllocation.enabled is true.
>
> 2. I used the following code:
>
> scala > val line =
> sc.textFile("/spark-events/application_1463681113470_0006")
>
> line: org.apache.spark.rdd.RDD[String] =
> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at
> textFile at :27
>
> scala > line.count # This command just stuck here
>
>
>
> 3. In the beginning, there is only 1 executor(this is for driver) and
> after line.count, I could see 3 executors, then dropped to 1.
>
> 4. Several jobs were launched and all of them failed.   Tasks (for all
> stages): Succeeded/Total : 0/2 (4 failed)
>
>
>
> *Error messages:*
>
>
>
> I found the following messages in spark web UI. I found this in spark.log
> on nodemanager machine as well.
>
>
>
> *ExecutorLostFailure (executor 1 exited caused by one of the running
> tasks) Reason: Container marked as failed:
> container_1463692924309_0002_01_02 on host: xxx.com
> . Exit status: 1. Diagnostics: Exception from
> container-launch.*
> *Container id: container_1463692924309_0002_01_02*
> *Exit code: 1*
> *Stack trace: ExitCodeException exitCode=1: *
> *at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)*
> *at org.apache.hadoop.util.Shell.run(Shell.java:455)*
> *at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)*
> *at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)*
> *at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)*
> *at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)*
> *at java.util.concurrent.FutureTask.run(FutureTask.java:266)*
> *at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
> *at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
> *at java.lang.Thread.run(Thread.java:745)*
>
> *Container exited with a non-zero exit code 1*
>
>
>
> Thanks a lot for help. We can provide more information if needed.
>
>
>
> Thanks,
> Weifeng
>
>
>
>
>
>
>
>
>
>
>