Actually we just ran a job with 70TB+ compressed data on 28 worker nodes - I didn't count the size of the uncompressed data, but I am guessing it is somewhere between 200TB to 700TB.
On Thu, Mar 20, 2014 at 12:23 AM, Usman Ghani <us...@platfora.com> wrote: > All, > What is the largest input data set y'all have come across that has been > successfully processed in production using spark. Ball park? >