RE: Which strategy is used for broadcast variables?

2015-03-11 Thread Mosharaf Chowdhury
@spark.apache.org Subject: Which strategy is used for broadcast variables? In Performance and Scalability of Broadcast in Spark by Mosharaf Chowdhury I read that Spark uses HDFS for its broadcast variables. This seems highly inefficient. In the same paper alternatives are proposed, among which Bittorent

Re: Which strategy is used for broadcast variables?

2015-03-11 Thread Mosharaf Chowdhury
machines, and it does follow O(log N) scaling. -- Mosharaf Chowdhury http://www.mosharaf.com/ On Wed, Mar 11, 2015 at 3:11 PM, Tom Hubregtsen thubregt...@gmail.com wrote: Thanks Mosharaf, for the quick response! Can you maybe give me some pointers to an explanation of this strategy

Re: How Broadcast variable scale?.

2015-02-23 Thread Mosharaf Chowdhury
Hi Guillermo, The current broadcast algorithm in Spark approximates the one described in the Section 5 of this paper http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf. It is expected to scale sub-linearly; i.e., O(log N), where N is the number of machines in your cluster. We

Re: Running the BroadcastTest.scala with TorrentBroadcastFactory in a standalone cluster

2014-07-03 Thread Mosharaf Chowdhury
, Mosharaf -- Mosharaf Chowdhury http://www.mosharaf.com/ On Thu, Jul 3, 2014 at 7:48 AM, jackxucs jackx...@gmail.com wrote: Hello, I am running the BroadcastTest example in a standalone cluster using spark-submit. I have 8 host machines and made Host1 the master. Host2 to Host8 act as 7

Re: sync master with slaves with bittorrent?

2014-05-19 Thread Mosharaf Chowdhury
Good catch. In that case, using BitTornado/murder would be better. -- Mosharaf Chowdhury http://www.mosharaf.com/ On Mon, May 19, 2014 at 11:17 AM, Aaron Davidson ilike...@gmail.com wrote: On the ec2 machines, you can update the slaves from the master using something like ~/spark-ec2/copy