I'm not really at liberty to discuss details of the job. It involves some expensive aggregated statistics, and took 10 hours to complete (mostly bottlenecked by network & io).
On Thu, Mar 20, 2014 at 11:12 AM, Surendranauth Hiraman < suren.hira...@velos.io> wrote: > Reynold, > > How complex was that job (I guess in terms of number of transforms and > actions) and how long did that take to process? > > -Suren > > > > On Thu, Mar 20, 2014 at 2:08 PM, Reynold Xin <r...@databricks.com> wrote: > > > Actually we just ran a job with 70TB+ compressed data on 28 worker nodes > - > > I didn't count the size of the uncompressed data, but I am guessing it is > > somewhere between 200TB to 700TB. > > > > > > > > On Thu, Mar 20, 2014 at 12:23 AM, Usman Ghani <us...@platfora.com> > wrote: > > > > > All, > > > What is the largest input data set y'all have come across that has been > > > successfully processed in production using spark. Ball park? > > > > > > > > > -- > > SUREN HIRAMAN, VP TECHNOLOGY > Velos > Accelerating Machine Learning > > 440 NINTH AVENUE, 11TH FLOOR > NEW YORK, NY 10001 > O: (917) 525-2466 ext. 105 > F: 646.349.4063 > E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io > W: www.velos.io >