[jira] [Updated] (SPARK-18738) Some Spark SQL queries has poor performance on HDFS Erasure Coding feature when enabling dynamic allocation.

Sean Owen (JIRA) Tue, 20 Dec 2016 05:47:04 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen updated SPARK-18738:
------------------------------

Don't set Fix version.

Naively, in the EC case, there are fewer replicas of the data, right? is it 
surprising that this limits the possibilities for node-local reads?

> Some Spark SQL queries has poor performance on HDFS Erasure Coding feature 
> when enabling dynamic allocation.
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18738
>                 URL: https://issues.apache.org/jira/browse/SPARK-18738
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2
>            Reporter: Lifeng Wang
>             Fix For: 2.2.0
>
>
> We run TPCx-BB with Spark SQL engine on local cluster using Spark 2.0.3 trunk 
> and Hadoop 3.0 alpha 2 trunk. We run Spark SQL queries with same data size on 
> both Erasure Coding and 3-replication.  The test results show that some 
> queries has much worse performance on EC compared to 3-replication. After 
> initial investigations, we found spark starts one third executors to execute 
> queries on EC compared to 3-replication. 
> We use query 30 as example, our cluster can totally launch 108 executors. 
> When we run the query from 3-replication database, spark will start all 108 
> executors to execute the query.  When we run the query from Erasure Coding 
> database, spark will launch 108 executors and kill 72 executors due to 
> they’re idle, at last there are only 36 executors to execute the query which 
> leads to poor performance.
> This issue only happens when we enable dynamic allocations mechanism. When we 
> disable the dynamic allocations, Spark SQL query on EC has the similar 
> performance with on 3-replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-18738) Some Spark SQL queries has poor performance on HDFS Erasure Coding feature when enabling dynamic allocation.

Reply via email to