I'm not really at liberty to discuss details of the job. It involves some
expensive aggregated statistics, and took 10 hours to complete (mostly
bottlenecked by network & io).





On Thu, Mar 20, 2014 at 11:12 AM, Surendranauth Hiraman <
suren.hira...@velos.io> wrote:

> Reynold,
>
> How complex was that job (I guess in terms of number of transforms and
> actions) and how long did that take to process?
>
> -Suren
>
>
>
> On Thu, Mar 20, 2014 at 2:08 PM, Reynold Xin <r...@databricks.com> wrote:
>
> > Actually we just ran a job with 70TB+ compressed data on 28 worker nodes
> -
> > I didn't count the size of the uncompressed data, but I am guessing it is
> > somewhere between 200TB to 700TB.
> >
> >
> >
> > On Thu, Mar 20, 2014 at 12:23 AM, Usman Ghani <us...@platfora.com>
> wrote:
> >
> > > All,
> > > What is the largest input data set y'all have come across that has been
> > > successfully processed in production using spark. Ball park?
> > >
> >
>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: (917) 525-2466 ext. 105
> F: 646.349.4063
> E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io
> W: www.velos.io
>

Reply via email to