Re: Breaking the previous large-scale sort record with Spark

arthur.hk.c...@gmail.com Fri, 10 Oct 2014 08:48:03 -0700

Wonderful !!

On 11 Oct, 2014, at 12:00 am, Nan Zhu <zhunanmcg...@gmail.com> wrote:


> Great! Congratulations!
> 
> -- 
> Nan Zhu
> On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote:
> 
>> Brilliant stuff ! Congrats all :-)
>> This is indeed really heartening news !
>> 
>> Regards,
>> Mridul
>> 
>> 
>> On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia <matei.zaha...@gmail.com> 
>> wrote:
>>> Hi folks,
>>> 
>>> I interrupt your regularly scheduled user / dev list to bring you some 
>>> pretty cool news for the project, which is that we've been able to use 
>>> Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x 
>>> faster on 10x fewer nodes. There's a detailed writeup at 
>>> http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort-record.html.
>>>  Summary: while Hadoop MapReduce held last year's 100 TB world record by 
>>> sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on 
>>> 206 nodes; and we also scaled up to sort 1 PB in 234 minutes.
>>> 
>>> I want to thank Reynold Xin for leading this effort over the past few 
>>> weeks, along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali 
>>> Ghodsi. In addition, we'd really like to thank Amazon's EC2 team for 
>>> providing the machines to make this possible. Finally, this result would of 
>>> course not be possible without the many many other contributions, testing 
>>> and feature requests from throughout the community.
>>> 
>>> For an engine to scale from these multi-hour petabyte batch jobs down to 
>>> 100-millisecond streaming and interactive queries is quite uncommon, and 
>>> it's thanks to all of you folks that we are able to make this happen.
>>> 
>>> Matei
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>

Re: Breaking the previous large-scale sort record with Spark

Reply via email to