subject:"RE\: Can not set spark dynamic resource allocation"

Re: Can not set spark dynamic resource allocation

2016-05-27 Thread Cui, Weifeng

Sorry to reply this late.


yarn.nodemanager.log-dirs
/local/output/logs/nm-log-dir
  

We do not use file://  in the settings, so that should not be the problem. Any 
other guesses?

Weifeng



On 5/20/16, 2:40 PM, "David Newberger" <david.newber...@wandcorp.com> wrote:

>Hi All,
>
>The error you are seeing looks really similar to Spark-13514 to me. I could be 
>wrong though
>
>https://issues.apache.org/jira/browse/SPARK-13514
>
>Can you check yarn.nodemanager.local-dirs  in your YARN configuration for 
>"file://"
>
>
>Cheers!
>David Newberger
>
>-Original Message-
>From: Cui, Weifeng [mailto:weife...@a9.com] 
>Sent: Friday, May 20, 2016 4:26 PM
>To: Marcelo Vanzin
>Cc: Ted Yu; Rodrick Brown; user; Zhao, Jun; Aulakh, Sahib; Song, Yiwei
>Subject: Re: Can not set spark dynamic resource allocation
>
>Sorry, here is the node-manager log. application_1463692924309_0002 is my 
>test. Hope this will help.
>http://pastebin.com/0BPEcgcW
>
>
>
>On 5/20/16, 2:09 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:
>
>>Hi Weifeng,
>>
>>That's the Spark event log, not the YARN application log. You get the 
>>latter using the "yarn logs" command.
>>
>>On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote:
>>> Here is the application log for this spark job.
>>>
>>> http://pastebin.com/2UJS9L4e
>>>
>>>
>>>
>>> Thanks,
>>> Weifeng
>>>
>>>
>>>
>>>
>>>
>>> From: "Aulakh, Sahib" <aula...@a9.com>
>>> Date: Friday, May 20, 2016 at 12:43 PM
>>> To: Ted Yu <yuzhih...@gmail.com>
>>> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng 
>>> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun"
>>> <junz...@a9.com>
>>> Subject: Re: Can not set spark dynamic resource allocation
>>>
>>>
>>>
>>> Yes it is yarn. We have configured spark shuffle service w yarn node 
>>> manager but something must be off.
>>>
>>>
>>>
>>> We will send u app log on paste bin.
>>>
>>> Sent from my iPhone
>>>
>>>
>>> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>> Since yarn-site.xml was cited, I assume the cluster runs YARN.
>>>
>>>
>>>
>>> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown 
>>> <rodr...@orchardplatform.com> wrote:
>>>
>>> Is this Yarn or Mesos? For the later you need to start an external 
>>> shuffle service.
>>>
>>> Get Outlook for iOS
>>>
>>>
>>>
>>>
>>>
>>> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" 
>>> <weife...@a9.com>
>>> wrote:
>>>
>>> Hi guys,
>>>
>>>
>>>
>>> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set 
>>> dynamic resource allocation for spark and we followed the following 
>>> link. After the changes, all spark jobs failed.
>>>
>>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-reso
>>> urce-allocation
>>>
>>> This test was on a test cluster which has 1 master machine (running 
>>> namenode, resourcemanager and hive server), 1 worker machine (running 
>>> datanode and nodemanager) and 1 machine as client( running spark shell).
>>>
>>>
>>>
>>> What I updated in config :
>>>
>>>
>>>
>>> 1. Update in spark-defaults.conf
>>>
>>> spark.dynamicAllocation.enabled true
>>>
>>> spark.shuffle.service.enabledtrue
>>>
>>>
>>>
>>> 2. Update yarn-site.xml
>>>
>>> 
>>>
>>>  yarn.nodemanager.aux-services
>>>   mapreduce_shuffle,spark_shuffle
>>> 
>>>
>>> 
>>> yarn.nodemanager.aux-services.spark_shuffle.class
>>> org.apache.spark.network.yarn.YarnShuffleService
>>> 
>>>
>>> 
>>> spark.shuffle.service.enabled
>>>  true
>>> 
>>>
>>> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
>>> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>>>
>>> 4. Restart nam

RE: Can not set spark dynamic resource allocation

2016-05-20 Thread David Newberger

Hi All,

The error you are seeing looks really similar to Spark-13514 to me. I could be 
wrong though

https://issues.apache.org/jira/browse/SPARK-13514

Can you check yarn.nodemanager.local-dirs  in your YARN configuration for 
"file://"


Cheers!
David Newberger

-Original Message-
From: Cui, Weifeng [mailto:weife...@a9.com] 
Sent: Friday, May 20, 2016 4:26 PM
To: Marcelo Vanzin
Cc: Ted Yu; Rodrick Brown; user; Zhao, Jun; Aulakh, Sahib; Song, Yiwei
Subject: Re: Can not set spark dynamic resource allocation

Sorry, here is the node-manager log. application_1463692924309_0002 is my test. 
Hope this will help.
http://pastebin.com/0BPEcgcW



On 5/20/16, 2:09 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:

>Hi Weifeng,
>
>That's the Spark event log, not the YARN application log. You get the 
>latter using the "yarn logs" command.
>
>On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote:
>> Here is the application log for this spark job.
>>
>> http://pastebin.com/2UJS9L4e
>>
>>
>>
>> Thanks,
>> Weifeng
>>
>>
>>
>>
>>
>> From: "Aulakh, Sahib" <aula...@a9.com>
>> Date: Friday, May 20, 2016 at 12:43 PM
>> To: Ted Yu <yuzhih...@gmail.com>
>> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng 
>> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun"
>> <junz...@a9.com>
>> Subject: Re: Can not set spark dynamic resource allocation
>>
>>
>>
>> Yes it is yarn. We have configured spark shuffle service w yarn node 
>> manager but something must be off.
>>
>>
>>
>> We will send u app log on paste bin.
>>
>> Sent from my iPhone
>>
>>
>> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> Since yarn-site.xml was cited, I assume the cluster runs YARN.
>>
>>
>>
>> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown 
>> <rodr...@orchardplatform.com> wrote:
>>
>> Is this Yarn or Mesos? For the later you need to start an external 
>> shuffle service.
>>
>> Get Outlook for iOS
>>
>>
>>
>>
>>
>> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" 
>> <weife...@a9.com>
>> wrote:
>>
>> Hi guys,
>>
>>
>>
>> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set 
>> dynamic resource allocation for spark and we followed the following 
>> link. After the changes, all spark jobs failed.
>>
>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-reso
>> urce-allocation
>>
>> This test was on a test cluster which has 1 master machine (running 
>> namenode, resourcemanager and hive server), 1 worker machine (running 
>> datanode and nodemanager) and 1 machine as client( running spark shell).
>>
>>
>>
>> What I updated in config :
>>
>>
>>
>> 1. Update in spark-defaults.conf
>>
>> spark.dynamicAllocation.enabled true
>>
>> spark.shuffle.service.enabledtrue
>>
>>
>>
>> 2. Update yarn-site.xml
>>
>> 
>>
>>  yarn.nodemanager.aux-services
>>   mapreduce_shuffle,spark_shuffle
>> 
>>
>> 
>> yarn.nodemanager.aux-services.spark_shuffle.class
>> org.apache.spark.network.yarn.YarnShuffleService
>> 
>>
>> 
>> spark.shuffle.service.enabled
>>  true
>> 
>>
>> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
>> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>>
>> 4. Restart namenode, datanode, resourcemanager, nodemanger... retart 
>> everything
>>
>> 5. The config will update in all machines, resourcemanager and nodemanager.
>> We update the config in one place and copy to all machines.
>>
>>
>>
>> What I tested:
>>
>>
>>
>> 1. I started a scala spark shell and check its environment variables, 
>> spark.dynamicAllocation.enabled is true.
>>
>> 2. I used the following code:
>>
>> scala > val line =
>> sc.textFile("/spark-events/application_1463681113470_0006")
>>
>> line: org.apache.spark.rdd.RDD[String] =
>> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at 
>> textFile at :27
>>
>> scala > line.count # This command j

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Cui, Weifeng

Sorry, here is the node-manager log. application_1463692924309_0002 is my test. 
Hope this will help.
http://pastebin.com/0BPEcgcW



On 5/20/16, 2:09 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:

>Hi Weifeng,
>
>That's the Spark event log, not the YARN application log. You get the
>latter using the "yarn logs" command.
>
>On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote:
>> Here is the application log for this spark job.
>>
>> http://pastebin.com/2UJS9L4e
>>
>>
>>
>> Thanks,
>> Weifeng
>>
>>
>>
>>
>>
>> From: "Aulakh, Sahib" <aula...@a9.com>
>> Date: Friday, May 20, 2016 at 12:43 PM
>> To: Ted Yu <yuzhih...@gmail.com>
>> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng
>> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun"
>> <junz...@a9.com>
>> Subject: Re: Can not set spark dynamic resource allocation
>>
>>
>>
>> Yes it is yarn. We have configured spark shuffle service w yarn node manager
>> but something must be off.
>>
>>
>>
>> We will send u app log on paste bin.
>>
>> Sent from my iPhone
>>
>>
>> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> Since yarn-site.xml was cited, I assume the cluster runs YARN.
>>
>>
>>
>> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown
>> <rodr...@orchardplatform.com> wrote:
>>
>> Is this Yarn or Mesos? For the later you need to start an external shuffle
>> service.
>>
>> Get Outlook for iOS
>>
>>
>>
>>
>>
>> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" <weife...@a9.com>
>> wrote:
>>
>> Hi guys,
>>
>>
>>
>> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic
>> resource allocation for spark and we followed the following link. After the
>> changes, all spark jobs failed.
>>
>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>>
>> This test was on a test cluster which has 1 master machine (running
>> namenode, resourcemanager and hive server), 1 worker machine (running
>> datanode and nodemanager) and 1 machine as client( running spark shell).
>>
>>
>>
>> What I updated in config :
>>
>>
>>
>> 1. Update in spark-defaults.conf
>>
>> spark.dynamicAllocation.enabled true
>>
>> spark.shuffle.service.enabledtrue
>>
>>
>>
>> 2. Update yarn-site.xml
>>
>> 
>>
>>  yarn.nodemanager.aux-services
>>   mapreduce_shuffle,spark_shuffle
>> 
>>
>> 
>> yarn.nodemanager.aux-services.spark_shuffle.class
>> org.apache.spark.network.yarn.YarnShuffleService
>> 
>>
>> 
>> spark.shuffle.service.enabled
>>  true
>> 
>>
>> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
>> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>>
>> 4. Restart namenode, datanode, resourcemanager, nodemanger... retart
>> everything
>>
>> 5. The config will update in all machines, resourcemanager and nodemanager.
>> We update the config in one place and copy to all machines.
>>
>>
>>
>> What I tested:
>>
>>
>>
>> 1. I started a scala spark shell and check its environment variables,
>> spark.dynamicAllocation.enabled is true.
>>
>> 2. I used the following code:
>>
>> scala > val line =
>> sc.textFile("/spark-events/application_1463681113470_0006")
>>
>> line: org.apache.spark.rdd.RDD[String] =
>> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile
>> at :27
>>
>> scala > line.count # This command just stuck here
>>
>>
>>
>> 3. In the beginning, there is only 1 executor(this is for driver) and after
>> line.count, I could see 3 executors, then dropped to 1.
>>
>> 4. Several jobs were launched and all of them failed.   Tasks (for all
>> stages): Succeeded/Total : 0/2 (4 failed)
>>
>>
>>
>> Error messages:
>>
>>
>>
>> I found the following messages in spark web UI. I found this in spark.log on
>> nodemanager machine as

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Marcelo Vanzin

Hi Weifeng,

That's the Spark event log, not the YARN application log. You get the
latter using the "yarn logs" command.

On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote:
> Here is the application log for this spark job.
>
> http://pastebin.com/2UJS9L4e
>
>
>
> Thanks,
> Weifeng
>
>
>
>
>
> From: "Aulakh, Sahib" <aula...@a9.com>
> Date: Friday, May 20, 2016 at 12:43 PM
> To: Ted Yu <yuzhih...@gmail.com>
> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng
> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun"
> <junz...@a9.com>
> Subject: Re: Can not set spark dynamic resource allocation
>
>
>
> Yes it is yarn. We have configured spark shuffle service w yarn node manager
> but something must be off.
>
>
>
> We will send u app log on paste bin.
>
> Sent from my iPhone
>
>
> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> Since yarn-site.xml was cited, I assume the cluster runs YARN.
>
>
>
> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown
> <rodr...@orchardplatform.com> wrote:
>
> Is this Yarn or Mesos? For the later you need to start an external shuffle
> service.
>
> Get Outlook for iOS
>
>
>
>
>
> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" <weife...@a9.com>
> wrote:
>
> Hi guys,
>
>
>
> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic
> resource allocation for spark and we followed the following link. After the
> changes, all spark jobs failed.
>
> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>
> This test was on a test cluster which has 1 master machine (running
> namenode, resourcemanager and hive server), 1 worker machine (running
> datanode and nodemanager) and 1 machine as client( running spark shell).
>
>
>
> What I updated in config :
>
>
>
> 1. Update in spark-defaults.conf
>
> spark.dynamicAllocation.enabled true
>
> spark.shuffle.service.enabledtrue
>
>
>
> 2. Update yarn-site.xml
>
> 
>
>  yarn.nodemanager.aux-services
>   mapreduce_shuffle,spark_shuffle
> 
>
> 
> yarn.nodemanager.aux-services.spark_shuffle.class
> org.apache.spark.network.yarn.YarnShuffleService
> 
>
> 
> spark.shuffle.service.enabled
>  true
> 
>
> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>
> 4. Restart namenode, datanode, resourcemanager, nodemanger... retart
> everything
>
> 5. The config will update in all machines, resourcemanager and nodemanager.
> We update the config in one place and copy to all machines.
>
>
>
> What I tested:
>
>
>
> 1. I started a scala spark shell and check its environment variables,
> spark.dynamicAllocation.enabled is true.
>
> 2. I used the following code:
>
> scala > val line =
> sc.textFile("/spark-events/application_1463681113470_0006")
>
> line: org.apache.spark.rdd.RDD[String] =
> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile
> at :27
>
> scala > line.count # This command just stuck here
>
>
>
> 3. In the beginning, there is only 1 executor(this is for driver) and after
> line.count, I could see 3 executors, then dropped to 1.
>
> 4. Several jobs were launched and all of them failed.   Tasks (for all
> stages): Succeeded/Total : 0/2 (4 failed)
>
>
>
> Error messages:
>
>
>
> I found the following messages in spark web UI. I found this in spark.log on
> nodemanager machine as well.
>
>
>
> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
> Reason: Container marked as failed: container_1463692924309_0002_01_02
> on host: xxx.com. Exit status: 1. Diagnostics: Exception from
> container-launch.
> Container id: container_1463692924309_0002_01_02
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(Contain

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Cui, Weifeng

Here is the application log for this spark job.
http://pastebin.com/2UJS9L4e

Thanks,
Weifeng

From: "Aulakh, Sahib" <aula...@a9.com>
Date: Friday, May 20, 2016 at 12:43 PM
To: Ted Yu <yuzhih...@gmail.com>
Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng <weife...@a9.com>, 
user <user@spark.apache.org>, "Zhao, Jun" <junz...@a9.com>
Subject: Re: Can not set spark dynamic resource allocation

Yes it is yarn. We have configured spark shuffle service w yarn node manager 
but something must be off.

We will send u app log on paste bin.

Sent from my iPhone

On May 20, 2016, at 12:35 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:
Since yarn-site.xml was cited, I assume the cluster runs YARN.

On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown 
<rodr...@orchardplatform.com<mailto:rodr...@orchardplatform.com>> wrote:
Is this Yarn or Mesos? For the later you need to start an external shuffle 
service.
Get Outlook for iOS<https://aka.ms/o0ukef>

On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" 
<weife...@a9.com<mailto:weife...@a9.com>> wrote:

Hi guys,

Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic 
resource allocation for spark and we followed the following link. After the 
changes, all spark jobs failed.
https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation

This test was on a test cluster which has 1 master machine (running namenode, 
resourcemanager and hive server), 1 worker machine (running datanode and 
nodemanager) and 1 machine as client( running spark shell).

What I updated in config :

1. Update in spark-defaults.conf

spark.dynamicAllocation.enabled true
spark.shuffle.service.enabledtrue

2. Update yarn-site.xml

 yarn.nodemanager.aux-services
  mapreduce_shuffle,spark_shuffle

yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService

spark.shuffle.service.enabled
 true

3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath 
($HADOOP_HOME/share/hadoop/yarn/*) in python code

4. Restart namenode, datanode, resourcemanager, nodemanger... retart everything

5. The config will update in all machines, resourcemanager and nodemanager. We 
update the config in one place and copy to all machines.

What I tested:

1. I started a scala spark shell and check its environment variables, 
spark.dynamicAllocation.enabled is true.

2. I used the following code:

scala > val line = 
sc.textFile("/spark-events/application_1463681113470_0006")

line: org.apache.spark.rdd.RDD[String] = 
/spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile at 
:27

scala > line.count # This command just stuck here

3. In the beginning, there is only 1 executor(this is for driver) and after 
line.count, I could see 3 executors, then dropped to 1.

4. Several jobs were launched and all of them failed.   Tasks (for all stages): 
Succeeded/Total : 0/2 (4 failed)

Error messages:

I found the following messages in spark web UI. I found this in spark.log on 
nodemanager machine as well.

ExecutorLostFailure (executor 1 exited caused by one of the running tasks) 
Reason: Container marked as failed: container_1463692924309_0002_01_02 on 
host: xxx.com<http://xxx.com>. Exit status: 1. 
Diagnostics: Exception from container-launch.
Container id: container_1463692924309_0002_01_02
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

Thanks a lot for help. We can provide more information if needed.

Thanks,
Weifeng

NOTICE TO RECIPIENTS: This communication is confidential and intended for the 
use of the addressee only. If you are not an intended recipient of this 
communication, please delete it immediately and notify the sender by return

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Aulakh, Sahib

Yes it is yarn. We have configured spark shuffle service w yarn node manager
but something must be off.

We will send u app log on paste bin.

Sent from my iPhone

On May 20, 2016, at 12:35 PM, Ted Yu
> wrote:

Since yarn-site.xml was cited, I assume the cluster runs YARN.

On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown
> wrote:
Is this Yarn or Mesos? For the later you need to start an external shuffle
service.

Get Outlook for iOS

On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng"
> wrote:

Hi guys,

Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic
resource allocation for spark and we followed the following link. After the
changes, all spark jobs failed.
https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation

This test was on a test cluster which has 1 master machine (running namenode,
resourcemanager and hive server), 1 worker machine (running datanode and
nodemanager) and 1 machine as client( running spark shell).

What I updated in config :

1. Update in spark-defaults.conf

spark.dynamicAllocation.enabled true
spark.shuffle.service.enabledtrue

2. Update yarn-site.xml

yarn.nodemanager.aux-services
mapreduce_shuffle,spark_shuffle

yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService

spark.shuffle.service.enabled
true

3. Copy spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
($HADOOP_HOME/share/hadoop/yarn/*) in python code

4. Restart namenode, datanode, resourcemanager, nodemanger... retart everything

5. The config will update in all machines, resourcemanager and nodemanager. We
update the config in one place and copy to all machines.

What I tested:

1. I started a scala spark shell and check its environment variables,
spark.dynamicAllocation.enabled is true.

2. I used the following code:

scala > val line =
sc.textFile("/spark-events/application_1463681113470_0006")

line: org.apache.spark.rdd.RDD[String] =
/spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile at
:27

scala > line.count # This command just stuck here

3. In the beginning, there is only 1 executor(this is for driver) and after
line.count, I could see 3 executors, then dropped to 1.

4. Several jobs were launched and all of them failed. Tasks (for all stages):
Succeeded/Total : 0/2 (4 failed)

Error messages:

I found the following messages in spark web UI. I found this in spark.log on
nodemanager machine as well.

ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
Reason: Container marked as failed: container_1463692924309_0002_01_02 on
host: xxx.com. Exit status: 1.
Diagnostics: Exception from container-launch.
Container id: container_1463692924309_0002_01_02
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

Thanks a lot for help. We can provide more information if needed.

Thanks,
Weifeng

NOTICE TO RECIPIENTS: This communication is confidential and intended for the
use of the addressee only. If you are not an intended recipient of this
communication, please delete it immediately and notify the sender by return
email. Unauthorized reading, dissemination, distribution or copying of this
communication is prohibited. This communication does not constitute an offer to
sell or a solicitation of an indication of interest to purchase any loan,
security or any other financial product or instrument, nor is it an offer to
sell or a solicitation of an indication of interest to purchase any products or
services to any persons who are prohibited from receiving such information
under applicable law. The contents of this communication may not

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Ted Yu

Since yarn-site.xml was cited, I assume the cluster runs YARN.

On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown  wrote:

> Is this Yarn or Mesos? For the later you need to start an external shuffle
> service.
>
> Get Outlook for iOS 
>
>
>
>
> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" 
> wrote:
>
> Hi guys,
>>
>>
>>
>> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set
>> dynamic resource allocation for spark and we followed the following link.
>> After the changes, all spark jobs failed.
>>
>>
>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>>
>> This test was on a test cluster which has 1 master machine (running
>> namenode, resourcemanager and hive server), 1 worker machine (running
>> datanode and nodemanager) and 1 machine as client( running spark shell).
>>
>>
>>
>> *What I updated in config :*
>>
>>
>>
>> 1. Update in spark-defaults.conf
>>
>> spark.dynamicAllocation.enabled true
>>
>> spark.shuffle.service.enabledtrue
>>
>>
>>
>> 2. Update yarn-site.xml
>>
>> 
>>
>>  yarn.nodemanager.aux-services
>>   mapreduce_shuffle,*spark_shuffle*
>> 
>>
>> 
>> yarn.nodemanager.aux-services.spark_shuffle.class
>>
>> org.apache.spark.network.yarn.YarnShuffleService
>> 
>>
>> 
>> spark.shuffle.service.enabled
>>  true
>> 
>>
>> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
>> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>>
>> 4. Restart namenode, datanode, resourcemanager, nodemanger...
>> retart everything
>>
>> 5. The config will update in all machines, resourcemanager
>> and nodemanager. We update the config in one place and copy to all machines.
>>
>>
>>
>> *What I tested:*
>>
>>
>>
>> 1. I started a scala spark shell and check its environment variables,
>> spark.dynamicAllocation.enabled is true.
>>
>> 2. I used the following code:
>>
>> scala > val line =
>> sc.textFile("/spark-events/application_1463681113470_0006")
>>
>> line: org.apache.spark.rdd.RDD[String] =
>> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at
>> textFile at :27
>>
>> scala > line.count # This command just stuck here
>>
>>
>>
>> 3. In the beginning, there is only 1 executor(this is for driver) and
>> after line.count, I could see 3 executors, then dropped to 1.
>>
>> 4. Several jobs were launched and all of them failed.   Tasks (for all
>> stages): Succeeded/Total : 0/2 (4 failed)
>>
>>
>>
>> *Error messages:*
>>
>>
>>
>> I found the following messages in spark web UI. I found this in spark.log
>> on nodemanager machine as well.
>>
>>
>>
>> *ExecutorLostFailure (executor 1 exited caused by one of the running
>> tasks) Reason: Container marked as failed:
>> container_1463692924309_0002_01_02 on host: xxx.com
>> . Exit status: 1. Diagnostics: Exception from
>> container-launch.*
>> *Container id: container_1463692924309_0002_01_02*
>> *Exit code: 1*
>> *Stack trace: ExitCodeException exitCode=1: *
>> *at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)*
>> *at org.apache.hadoop.util.Shell.run(Shell.java:455)*
>> *at
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)*
>> *at
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)*
>> *at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)*
>> *at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)*
>> *at java.util.concurrent.FutureTask.run(FutureTask.java:266)*
>> *at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
>> *at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
>> *at java.lang.Thread.run(Thread.java:745)*
>>
>> *Container exited with a non-zero exit code 1*
>>
>>
>>
>> Thanks a lot for help. We can provide more information if needed.
>>
>>
>>
>> Thanks,
>> Weifeng
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> *NOTICE TO RECIPIENTS*: This communication is confidential and intended
> for the use of the addressee only. If you are not an intended recipient of
> this communication, please delete it immediately and notify the sender by
> return email. Unauthorized reading, dissemination, distribution or copying
> of this communication is prohibited. This communication does not constitute
> an offer to sell or a solicitation of an indication of interest to purchase
> any loan, security or any other financial product or instrument, nor is it
> an offer to sell or a solicitation of an indication of interest to purchase
> any products or services to any persons who are prohibited from

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Rodrick Brown

Is this Yarn or Mesos? For the later you need to start an external shuffle
service.

Get Outlook for iOS

On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" wrote:

Hi guys,

Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic
resource allocation for spark and we followed the following link. After the
changes, all spark jobs failed.

https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation

What I updated in config :

1. Update in spark-defaults.conf

spark.dynamicAllocation.enabled true

spark.shuffle.service.enabled true

2. Update yarn-site.xml

yarn.nodemanager.aux-services

mapreduce_shuffle,spark_shuffle

yarn.nodemanager.aux-services.spark_shuffle.class

org.apache.spark.network.yarn.YarnShuffleService

spark.shuffle.service.enabled

true

3. Copy spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
($HADOOP_HOME/share/hadoop/yarn/*) in python code

4. Restart namenode, datanode, resourcemanager, nodemanger... retart everything

5. The config will update in all machines, resourcemanager and nodemanager. We
update the config in one place and copy to all machines.

What I tested:

1. I started a scala spark shell and check its environment variables,
spark.dynamicAllocation.enabled is true.

2. I used the following code:

scala > val line =
sc.textFile("/spark-events/application_1463681113470_0006")

line: org.apache.spark.rdd.RDD[String] =
/spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile at
:27

scala > line.count # This command just stuck here

3. In the beginning, there is only 1 executor(this is for driver) and after
line.count, I could see 3 executors, then dropped to 1.

4. Several jobs were launched and all of them failed. Tasks (for all stages):
Succeeded/Total : 0/2 (4 failed)

Error messages:

I found the following messages in spark web UI. I found this in spark.log on
nodemanager machine as well.

Container id: container_1463692924309_0002_01_02

Exit code: 1

Stack trace: ExitCodeException exitCode=1:

at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)

at org.apache.hadoop.util.Shell.run(Shell.java:455)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)

at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)

at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)

at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

Thanks a lot for help. We can provide more information if needed.

Thanks,

Weifeng

--
*NOTICE TO RECIPIENTS*: This communication is confidential and intended for
the use of the addressee only. If you are not an intended recipient of this
communication, please delete it immediately and notify the sender by return
email. Unauthorized reading, dissemination, distribution or copying of this
communication is prohibited. This communication does not constitute an
offer to sell or a solicitation of an indication of interest to purchase
any loan, security or any other financial product or instrument, nor is it
an offer to sell or a solicitation of an indication of interest to purchase
any products or services to any persons who are prohibited from receiving
such information under applicable law. The contents of this communication
may not be accurate or complete and are subject to change without notice.
As such, Orchard App, Inc. (including its subsidiaries and affiliates,
"Orchard") makes no representation regarding the accuracy or completeness
of the information contained herein. The intended recipient is advised to
consult its own professional advisors, including those specializing in
legal,

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Ted Yu

Can you retrieve the log for application_1463681113470_0006 and pastebin it
?

Thanks

On Fri, May 20, 2016 at 11:48 AM, Cui, Weifeng  wrote:

> Hi guys,
>
>
>
> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set
> dynamic resource allocation for spark and we followed the following link.
> After the changes, all spark jobs failed.
>
>
> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>
> This test was on a test cluster which has 1 master machine (running
> namenode, resourcemanager and hive server), 1 worker machine (running
> datanode and nodemanager) and 1 machine as client( running spark shell).
>
>
>
> *What I updated in config :*
>
>
>
> 1. Update in spark-defaults.conf
>
> spark.dynamicAllocation.enabled true
>
> spark.shuffle.service.enabledtrue
>
>
>
> 2. Update yarn-site.xml
>
> 
>
>  yarn.nodemanager.aux-services
>   mapreduce_shuffle,*spark_shuffle*
> 
>
> 
> yarn.nodemanager.aux-services.spark_shuffle.class
> org.apache.spark.network.yarn.YarnShuffleService
> 
>
> 
> spark.shuffle.service.enabled
>  true
> 
>
> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>
> 4. Restart namenode, datanode, resourcemanager, nodemanger...
> retart everything
>
> 5. The config will update in all machines, resourcemanager
> and nodemanager. We update the config in one place and copy to all machines.
>
>
>
> *What I tested:*
>
>
>
> 1. I started a scala spark shell and check its environment variables,
> spark.dynamicAllocation.enabled is true.
>
> 2. I used the following code:
>
> scala > val line =
> sc.textFile("/spark-events/application_1463681113470_0006")
>
> line: org.apache.spark.rdd.RDD[String] =
> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at
> textFile at :27
>
> scala > line.count # This command just stuck here
>
>
>
> 3. In the beginning, there is only 1 executor(this is for driver) and
> after line.count, I could see 3 executors, then dropped to 1.
>
> 4. Several jobs were launched and all of them failed.   Tasks (for all
> stages): Succeeded/Total : 0/2 (4 failed)
>
>
>
> *Error messages:*
>
>
>
> I found the following messages in spark web UI. I found this in spark.log
> on nodemanager machine as well.
>
>
>
> *ExecutorLostFailure (executor 1 exited caused by one of the running
> tasks) Reason: Container marked as failed:
> container_1463692924309_0002_01_02 on host: xxx.com
> . Exit status: 1. Diagnostics: Exception from
> container-launch.*
> *Container id: container_1463692924309_0002_01_02*
> *Exit code: 1*
> *Stack trace: ExitCodeException exitCode=1: *
> *at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)*
> *at org.apache.hadoop.util.Shell.run(Shell.java:455)*
> *at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)*
> *at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)*
> *at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)*
> *at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)*
> *at java.util.concurrent.FutureTask.run(FutureTask.java:266)*
> *at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
> *at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
> *at java.lang.Thread.run(Thread.java:745)*
>
> *Container exited with a non-zero exit code 1*
>
>
>
> Thanks a lot for help. We can provide more information if needed.
>
>
>
> Thanks,
> Weifeng
>
>
>
>
>
>
>
>
>
>
>

Re: Can not set spark dynamic resource allocation

RE: Can not set spark dynamic resource allocation

Re: Can not set spark dynamic resource allocation

Re: Can not set spark dynamic resource allocation

Re: Can not set spark dynamic resource allocation

Re: Can not set spark dynamic resource allocation

Re: Can not set spark dynamic resource allocation

Re: Can not set spark dynamic resource allocation

Re: Can not set spark dynamic resource allocation

9 matches

Site Navigation

Mail list logo

Footer information