[ 
https://issues.apache.org/jira/browse/SPARK-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187424#comment-15187424
 ] 

Thomas Graves commented on SPARK-13723:
---------------------------------------

Warnings going by when spark submit is started are really pretty useless unless 
user is explicitly looking for something.  Way to many things get printed there 
for them to notice.

spark-submit -help doesn't list the behavior of num-executors now when this is 
on. This is probably separate bug.

If its already mis-understood, which I know it is because I've had to explain 
to multiple people, then I don't see an argument for not changing the behavior. 

It really comes down to what would be the best experience for users.  If we 
have arguments one way or another then I could be swayed.

I also think its a bit confusing to look at the configs and see that dynamic 
allocation config is on but its not using it because --num-executors is 
specified.

One reason to not change this is if we think Spark isn't ready.  For instance 
spark has some know issues with scalability and so with dynamic allocations 
users could be getting thousands of executors vs a few or 10's and we could hit 
spark internal issues or require more memory for the AM by default.  If that 
makes user experience worse that would be a reason not to do it.



> YARN - Change behavior of --num-executors when 
> spark.dynamicAllocation.enabled true
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-13723
>                 URL: https://issues.apache.org/jira/browse/SPARK-13723
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 2.0.0
>            Reporter: Thomas Graves
>            Priority: Minor
>
> I think we should change the behavior when --num-executors is specified when 
> dynamic allocation is enabled. Currently if --num-executors is specified 
> dynamic allocation is disabled and it just uses a static number of executors.
> I would rather see the default behavior changed in the 2.x line. If dynamic 
> allocation config is on then num-executors goes to max and initial # of 
> executors. I think this would allow users to easily cap their usage and would 
> still allow it to free up executors. It would also allow users doing ML start 
> out with a # of executors and if they are actually caching the data the 
> executors wouldn't be freed up. So you would get very similar behavior to if 
> dynamic allocation was off.
> Part of the reason for this is when using a static number if generally wastes 
> resources, especially with people doing adhoc things with spark-shell. It 
> also has a big affect when people are doing MapReduce/ETL type work loads.   
> The problem is that people are used to specifying num-executors so if we turn 
> it on by default in a cluster config its just overridden.
> We should also update the spark-submit --help description for --num-executors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to