[jira] [Updated] (SPARK-10087) In some cases, all reducers are scheduled to the same executor

Yin Huai (JIRA) Tue, 18 Aug 2015 19:15:07 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yin Huai updated SPARK-10087:
-----------------------------
    Description: 
In some cases, when spark.shuffle.reduceLocality.enabled is enabled, we are 
scheduling all reducers to the same executor (the cluster has plenty of 
resources). Changing spark.shuffle.reduceLocality.enabled to false resolve the 
problem. 

Comments of https://github.com/apache/spark/pull/8280 provide more details of 
the symptom of this issue.

The query I was using is
{code:sql}
select
  i_brand_id,
  i_brand,
  i_manufact_id,
  i_manufact,
  sum(ss_ext_sales_price) ext_price
from
  store_sales
  join item on (store_sales.ss_item_sk = item.i_item_sk)
  join customer on (store_sales.ss_customer_sk = customer.c_customer_sk)
  join customer_address on (customer.c_current_addr_sk = 
customer_address.ca_address_sk)
  join store on (store_sales.ss_store_sk = store.s_store_sk)
  join date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
where
  --ss_date between '1999-11-01' and '1999-11-30'
  ss_sold_date_sk between 2451484 and 2451513
  and d_moy = 11
  and d_year = 1999
  and i_manager_id = 7
  and substr(ca_zip, 1, 5) <> substr(s_zip, 1, 5)
group by
  i_brand,
  i_brand_id,
  i_manufact_id,
  i_manufact
order by
  ext_price desc,
  i_brand,
  i_brand_id,
  i_manufact_id,
  i_manufact
limit 100
{code}
The dataset is tpc-ds scale factor 1500. To reproduce the problem, you can just 
join store_sales with customer and make sure there is only one mapper reads the 
data of customer.
 

  was:
In some cases, when spark.shuffle.reduceLocality.enabled is enabled, we are 
scheduling all reducers to the same executor (the cluster has plenty of 
resources). Changing spark.shuffle.reduceLocality.enabled to false resolve the 
problem. 

Here is a little bit more information. For one of my query, all 200 reducers 
were scheduled to the same reducer and every reducer has about 800 KB input.
 


> In some cases, all reducers are scheduled to the same executor
> --------------------------------------------------------------
>
>                 Key: SPARK-10087
>                 URL: https://issues.apache.org/jira/browse/SPARK-10087
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.5.0
>            Reporter: Yin Huai
>            Priority: Critical
>
> In some cases, when spark.shuffle.reduceLocality.enabled is enabled, we are 
> scheduling all reducers to the same executor (the cluster has plenty of 
> resources). Changing spark.shuffle.reduceLocality.enabled to false resolve 
> the problem. 
> Comments of https://github.com/apache/spark/pull/8280 provide more details of 
> the symptom of this issue.
> The query I was using is
> {code:sql}
> select
>   i_brand_id,
>   i_brand,
>   i_manufact_id,
>   i_manufact,
>   sum(ss_ext_sales_price) ext_price
> from
>   store_sales
>   join item on (store_sales.ss_item_sk = item.i_item_sk)
>   join customer on (store_sales.ss_customer_sk = customer.c_customer_sk)
>   join customer_address on (customer.c_current_addr_sk = 
> customer_address.ca_address_sk)
>   join store on (store_sales.ss_store_sk = store.s_store_sk)
>   join date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> where
>   --ss_date between '1999-11-01' and '1999-11-30'
>   ss_sold_date_sk between 2451484 and 2451513
>   and d_moy = 11
>   and d_year = 1999
>   and i_manager_id = 7
>   and substr(ca_zip, 1, 5) <> substr(s_zip, 1, 5)
> group by
>   i_brand,
>   i_brand_id,
>   i_manufact_id,
>   i_manufact
> order by
>   ext_price desc,
>   i_brand,
>   i_brand_id,
>   i_manufact_id,
>   i_manufact
> limit 100
> {code}
> The dataset is tpc-ds scale factor 1500. To reproduce the problem, you can 
> just join store_sales with customer and make sure there is only one mapper 
> reads the data of customer.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10087) In some cases, all reducers are scheduled to the same executor

Reply via email to