Re: Current status of Sparrow

Kay Ousterhout Mon, 23 Jun 2014 10:28:33 -0700

Hi Liquan,

Sparrow is not currently integrated into the Spark distribution, so if
you'd like to use Spark with Sparrow, you need to use a forked version of
Spark (https://github.com/kayousterhout/spark/tree/sparrow).  This version
of Spark was forked off an older version of Spark so some work will be
involved to bring this up to date with the latest version of Spark; I can
help with this.

Unfortunately there are also a few practical problems with using Sparrow
with Spark that may or may not be compatible with your target workload.
 Sparrow distributes scheduling over many Sparrow schedulers that are each
associated with their own Spark driver (this is where Sparrow's
improvements stem from -- there's no longer a single driver serving as the
bottleneck for your application, but all of the schedulers/drivers share
the same slots for scheduling tasks).  As a result, data stored in Spark's
block manager on one Spark driver (and created as part of a job scheduled
by the associated Sparrow scheduler) cannot be accessed by other Spark
drivers.  If you're storing data in Tachyon or have a workload where
different jobs have disjoint working sets, this won't be an issue.

-Kay

On Fri, Jun 20, 2014 at 5:47 PM, Liquan Pei <liquan...@gmail.com> wrote:

> Hi
>
> What is the current status of Sparrow integration with Spark? I would like
> to integrate Sparrow with Spark 1.0 on a 100 node cluster. Any suggestions?
>
> Thanks a lot for your help!
> Liquan
>

Re: Current status of Sparrow

Reply via email to