Re: Largest input data set observed for Spark.

Reynold Xin Thu, 20 Mar 2014 11:29:28 -0700

I'm not really at liberty to discuss details of the job. It involves some
expensive aggregated statistics, and took 10 hours to complete (mostly
bottlenecked by network & io).






On Thu, Mar 20, 2014 at 11:12 AM, Surendranauth Hiraman <
[email protected]> wrote:

> Reynold,
>
> How complex was that job (I guess in terms of number of transforms and
> actions) and how long did that take to process?
>
> -Suren
>
>
>
> On Thu, Mar 20, 2014 at 2:08 PM, Reynold Xin <[email protected]> wrote:
>
> > Actually we just ran a job with 70TB+ compressed data on 28 worker nodes
> -
> > I didn't count the size of the uncompressed data, but I am guessing it is
> > somewhere between 200TB to 700TB.
> >
> >
> >
> > On Thu, Mar 20, 2014 at 12:23 AM, Usman Ghani <[email protected]>
> wrote:
> >
> > > All,
> > > What is the largest input data set y'all have come across that has been
> > > successfully processed in production using spark. Ball park?
> > >
> >
>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: (917) 525-2466 ext. 105
> F: 646.349.4063
> E: suren.hiraman@v <[email protected]>elos.io
> W: www.velos.io
>

Re: Largest input data set observed for Spark.

Reply via email to