This question is conflating a few different concepts. I think the main
question is whether Spark will have a shuffle implementation that
streams data rather than persisting it to disk/cache as a buffer.
Spark currently decouples the shuffle write from the read using
disk/OS cache as a buffer. The two benefits of this approach this are
that it allows intra-query fault tolerance and it makes it easier to
elastically scale and reschedule work within a job. We consider these
to be design requirements (think about jobs that run for several hours
on hundreds of machines). Impala, and similar systems like dremel and
f1, not offer fault tolerance within a query at present. They also
require gang scheduling the entire set of resources that will exist
for the duration of a query.

A secondary question is whether our shuffle should have a barrier or
not. Spark's shuffle currently has a hard barrier between map and
reduce stages. We haven't seen really strong evidence that removing
the barrier is a net win. It can help the performance of a single job
(modestly), but in the a multi-tenant workload, it leads to poor
utilization since you have a lot of reduce tasks that are taking up
slots waiting for mappers to finish. Many large scale users of
Map/Reduce disable this feature in production clusters for that
reason. Thus, we haven't seen compelling evidence for removing the
barrier at this point, given the complexity of doing so.

It is possible that future versions of Spark will support push-based
shuffles, potentially in a mode that remove some of Spark's fault
tolerance properties. But there are many other things we can still
optimize about the shuffle that would likely come before this.

- Patrick

On Wed, Jan 7, 2015 at 6:01 PM, 曹雪林 <xuelincao2...@gmail.com> wrote:
> Hi,
>
>       I've heard a lot of complain about spark's "pull" style shuffle. Is
> there any plan to support "push" style shuffle in the near future?
>
>       Currently, the shuffle phase must be completed before the next stage
> starts. While, it is said, in Impala, the shuffled data is "streamed" to
> the next stage handler, which greatly saves time. Will spark support this
> mechanism one day?
>
> Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to