@spark.apache.org
Subject: Which strategy is used for broadcast variables?
In Performance and Scalability of Broadcast in Spark by Mosharaf Chowdhury
I read that Spark uses HDFS for its broadcast variables. This seems highly
inefficient. In the same paper alternatives are proposed, among which
Bittorent
machines, and it does follow O(log N) scaling.
--
Mosharaf Chowdhury
http://www.mosharaf.com/
On Wed, Mar 11, 2015 at 3:11 PM, Tom Hubregtsen thubregt...@gmail.com
wrote:
Thanks Mosharaf, for the quick response! Can you maybe give me some
pointers to an explanation of this strategy
Hi Guillermo,
The current broadcast algorithm in Spark approximates the one described in
the Section 5 of this paper
http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf.
It is expected to scale sub-linearly; i.e., O(log N), where N is the number
of machines in your cluster.
We
,
Mosharaf
--
Mosharaf Chowdhury
http://www.mosharaf.com/
On Thu, Jul 3, 2014 at 7:48 AM, jackxucs jackx...@gmail.com wrote:
Hello,
I am running the BroadcastTest example in a standalone cluster using
spark-submit. I have 8 host machines and made Host1 the master. Host2 to
Host8 act as 7
Good catch. In that case, using BitTornado/murder would be better.
--
Mosharaf Chowdhury
http://www.mosharaf.com/
On Mon, May 19, 2014 at 11:17 AM, Aaron Davidson ilike...@gmail.com wrote:
On the ec2 machines, you can update the slaves from the master using
something like ~/spark-ec2/copy