subject:"RE\: Which strategy is used for broadcast variables\?"

RE: Which strategy is used for broadcast variables?

2015-03-11 Thread Mosharaf Chowdhury

Hi Tom, That's an outdated document from 4/5 years ago. Spark currently uses a BitTorrent like mechanism that's been tuned for datacenter environments. Mosharaf -Original Message- From: Tom thubregt...@gmail.com Sent: ‎3/‎11/‎2015 4:58 PM To: user@spark.apache.org

Re: Which strategy is used for broadcast variables?

2015-03-11 Thread Mosharaf Chowdhury

The current broadcast algorithm in Spark approximates the one described in the Section 5 of this paper http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf. It is expected to scale sub-linearly; i.e., O(log N), where N is the number of machines in your cluster. We evaluated up to 100

Re: Which strategy is used for broadcast variables?

2015-03-11 Thread Tom Hubregtsen

Those results look very good for the larger workloads (100MB and 1GB). Were you also able to run experiments for smaller amounts of data? For instance broadcasting a single variable to the entire cluster? In the paper you state that HDFS-based mechanisms performed well only for small amounts of

Re: Which strategy is used for broadcast variables?

2015-03-11 Thread Tom Hubregtsen

Thanks Mosharaf, for the quick response! Can you maybe give me some pointers to an explanation of this strategy? Or elaborate a bit more on it? Which parts are involved in which way? Where are the time penalties and how scalable is this implementation? Thanks again, Tom On 11 March 2015 at

RE: Which strategy is used for broadcast variables?

Re: Which strategy is used for broadcast variables?

Re: Which strategy is used for broadcast variables?

Re: Which strategy is used for broadcast variables?

4 matches

Site Navigation

Mail list logo

Footer information