[
https://issues.apache.org/jira/browse/SPARK-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-18738:
------------------------------
Don't set Fix version.
Naively, in the EC case, there are fewer replicas of the data, right? is it
surprising that this limits the possibilities for node-local reads?
> Some Spark SQL queries has poor performance on HDFS Erasure Coding feature
> when enabling dynamic allocation.
> ------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-18738
> URL: https://issues.apache.org/jira/browse/SPARK-18738
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.2
> Reporter: Lifeng Wang
> Fix For: 2.2.0
>
>
> We run TPCx-BB with Spark SQL engine on local cluster using Spark 2.0.3 trunk
> and Hadoop 3.0 alpha 2 trunk. We run Spark SQL queries with same data size on
> both Erasure Coding and 3-replication. The test results show that some
> queries has much worse performance on EC compared to 3-replication. After
> initial investigations, we found spark starts one third executors to execute
> queries on EC compared to 3-replication.
> We use query 30 as example, our cluster can totally launch 108 executors.
> When we run the query from 3-replication database, spark will start all 108
> executors to execute the query. When we run the query from Erasure Coding
> database, spark will launch 108 executors and kill 72 executors due to
> they’re idle, at last there are only 36 executors to execute the query which
> leads to poor performance.
> This issue only happens when we enable dynamic allocations mechanism. When we
> disable the dynamic allocations, Spark SQL query on EC has the similar
> performance with on 3-replication.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]