[
https://issues.apache.org/jira/browse/SPARK-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lifeng Wang updated SPARK-18738:
--------------------------------
Description:
We run TPCx-BB with Spark SQL engine on local cluster using Spark 2.0.3 trunk
and Hadoop 3.0 alpha 2 trunk. We run Spark SQL queries with same data size on
both Erasure Coding and 3-replication. The test results show that some queries
has much worse performance on EC compared to 3-replication. After initial
investigations, we found spark starts one third executors to execute queries on
EC compared to 3-replication.
We use query 30 as example, our cluster can totally launch 108 executors. When
we run the query from 3-replication database, spark will start all 108
executors to execute the query. When we run the query from Erasure Coding
database, spark will launch 108 executors and kill 72 executors due to they’re
idle, at last there are only 36 executors to execute the query which leads to
poor performance.
This issue only happens when we enable dynamic allocations mechanism. When we
disable the dynamic allocations, Spark SQL query on EC has the similar
performance with on 3-replication.
was:
We run TPCx-BB with Spark SQL engine on local cluster using Spark 2.0.3 trunk
and Hadoop 3.0 alpha 2 trunk. We run Spark SQL queries with same data size on
both Erasure Coding and 3-replication. The test results show that some queries
has much worse performance on EC compared to 3-replication. After initial
investigations, we found spark starts one third executors to execute queries on
EC compared to 3-replication.
We use query 30 as example, our cluster can totally launch 108 executors. When
we run the query from 3-replication database, spark will start all 108
executors to execute the query. When we run the query from Erasure Coding
database, spark will launch 108 executors and kill 72 executors due to they’re
idle, at last there are only 36 executors to execute the query which leads to
poor performance.
This issues only happens when we enable dynamic allocations mechanism. When we
disable the dynamic allocations, Spark SQL query on EC has the similar
performance with on 3-replication.
> Some Spark SQL queries has poor performance on HDFS Erasure Coding feature
> when enabling dynamic allocation.
> ------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-18738
> URL: https://issues.apache.org/jira/browse/SPARK-18738
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.2
> Reporter: Lifeng Wang
> Fix For: 2.2.0
>
>
> We run TPCx-BB with Spark SQL engine on local cluster using Spark 2.0.3 trunk
> and Hadoop 3.0 alpha 2 trunk. We run Spark SQL queries with same data size on
> both Erasure Coding and 3-replication. The test results show that some
> queries has much worse performance on EC compared to 3-replication. After
> initial investigations, we found spark starts one third executors to execute
> queries on EC compared to 3-replication.
> We use query 30 as example, our cluster can totally launch 108 executors.
> When we run the query from 3-replication database, spark will start all 108
> executors to execute the query. When we run the query from Erasure Coding
> database, spark will launch 108 executors and kill 72 executors due to
> they’re idle, at last there are only 36 executors to execute the query which
> leads to poor performance.
> This issue only happens when we enable dynamic allocations mechanism. When we
> disable the dynamic allocations, Spark SQL query on EC has the similar
> performance with on 3-replication.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]