The current broadcast algorithm in Spark approximates the one described in the Section 5 of this paper <http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf>. It is expected to scale sub-linearly; i.e., O(log N), where N is the number of machines in your cluster. We evaluated up to 100 machines, and it does follow O(log N) scaling.
-- Mosharaf Chowdhury http://www.mosharaf.com/ On Wed, Mar 11, 2015 at 3:11 PM, Tom Hubregtsen <thubregt...@gmail.com> wrote: > Thanks Mosharaf, for the quick response! Can you maybe give me some > pointers to an explanation of this strategy? Or elaborate a bit more on it? > Which parts are involved in which way? Where are the time penalties and how > scalable is this implementation? > > Thanks again, > > Tom > > On 11 March 2015 at 16:01, Mosharaf Chowdhury <mosharafka...@gmail.com> > wrote: > >> Hi Tom, >> >> That's an outdated document from 4/5 years ago. >> >> Spark currently uses a BitTorrent like mechanism that's been tuned for >> datacenter environments. >> >> Mosharaf >> ------------------------------ >> From: Tom <thubregt...@gmail.com> >> Sent: 3/11/2015 4:58 PM >> To: user@spark.apache.org >> Subject: Which strategy is used for broadcast variables? >> >> In "Performance and Scalability of Broadcast in Spark" by Mosharaf >> Chowdhury >> I read that Spark uses HDFS for its broadcast variables. This seems highly >> inefficient. In the same paper alternatives are proposed, among which >> "Bittorent Broadcast (BTB)". While studying "Learning Spark," page 105, >> second paragraph about Broadcast Variables, I read " The value is sent to >> each node only once, using an efficient, BitTorrent-like communication >> mechanism." >> >> - Is the book talking about the proposed BTB from the paper? >> >> - Is this currently the default? >> >> - If not, what is? >> >> Thanks, >> >> Tom >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Which-strategy-is-used-for-broadcast-variables-tp22004.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >