Maybe there is a firewall issue that makes it slow for your nodes to connect 
through the IP addresses they're configured with. I see there's this 10 second 
pause between "Updated info of block broadcast_84_piece1" and 
"ensureFreeSpace(4194304) called" (where it actually receives the block). HTTP 
broadcast used only HTTP fetches from the executors to the driver, but 
TorrentBroadcast has connections between the executors themselves and between 
executors and the driver over a different port. Where are you running your 
driver app and nodes?

Matei

On Oct 7, 2014, at 11:42 AM, Davies Liu <dav...@databricks.com> wrote:

> Could you create a JIRA for it? maybe it's a regression after
> https://issues.apache.org/jira/browse/SPARK-3119.
> 
> We will appreciate that if you could tell how to reproduce it.
> 
> On Mon, Oct 6, 2014 at 1:27 AM, Guillaume Pitel
> <guillaume.pi...@exensa.com> wrote:
>> Hi,
>> 
>> I've had no answer to this on u...@spark.apache.org, so I post it on dev
>> before filing a JIRA (in case the problem or solution is already identified)
>> 
>> We've had some performance issues since switching to 1.1.0, and we finally
>> found the origin : TorrentBroadcast seems to be very slow in our setting
>> (and it became default with 1.1.0)
>> 
>> The logs of a 4MB variable with TorrentBroadcast : (15s)
>> 
>> 14/10/01 15:47:13 INFO storage.MemoryStore: Block broadcast_84_piece1 stored
>> as bytes in memory (estimated size 171.6 KB, free 7.2 GB)
>> 14/10/01 15:47:13 INFO storage.BlockManagerMaster: Updated info of block
>> broadcast_84_piece1
>> 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4194304) called
>> with curMem=1401611984, maxMem=9168696115
>> 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84_piece0 stored
>> as bytes in memory (estimated size 4.0 MB, free 7.2 GB)
>> 14/10/01 15:47:23 INFO storage.BlockManagerMaster: Updated info of block
>> broadcast_84_piece0
>> 14/10/01 15:47:23 INFO broadcast.TorrentBroadcast: Reading broadcast
>> variable 84 took 15.202260006 s
>> 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4371392) called
>> with curMem=1405806288, maxMem=9168696115
>> 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84 stored as
>> values in memory (estimated size 4.2 MB, free 7.2 GB)
>> 
>> (notice that a 10s lag happens after the "Updated info of block
>> broadcast_..." and before the MemoryStore log
>> 
>> And with HttpBroadcast (0.3s):
>> 
>> 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Started reading broadcast
>> variable 147
>> 14/10/01 16:05:58 INFO storage.MemoryStore: ensureFreeSpace(4369376) called
>> with curMem=1373493232, maxMem=9168696115
>> 14/10/01 16:05:58 INFO storage.MemoryStore: Block broadcast_147 stored as
>> values in memory (estimated size 4.2 MB, free 7.3 GB)
>> 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Reading broadcast variable
>> 147 took 0.320907112 s 14/10/01 16:05:58 INFO storage.BlockManager: Found
>> block broadcast_147 locally
>> 
>> Since Torrent is supposed to perform much better than Http, we suspect a
>> configuration error from our side, but are unable to pin it down. Does
>> someone have any idea of the origin of the problem ?
>> 
>> For now we're sticking with the HttpBroadcast workaround.
>> 
>> Guillaume
>> --
>> Guillaume PITEL, Président
>> +33(0)626 222 431
>> 
>> eXenSa S.A.S.
>> 41, rue Périer - 92120 Montrouge - FRANCE
>> Tel +33(0)184 163 677 / Fax +33(0)972 283 705
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to