Thanks for the feedback. For 1, there is an open patch: 
https://github.com/apache/spark/pull/2659. For 2, broadcast blocks actually use 
MEMORY_AND_DISK storage, so they will spill to disk if you have low memory, but 
they're faster to access otherwise.

Matei

On Oct 9, 2014, at 12:11 PM, Guillaume Pitel <guillaume.pi...@exensa.com> wrote:

> Hi,
> 
> Thanks to your answer, we've found the problem. It was on reverse IP 
> resolution on the drivers we used (wrong configuration of the local bind9). 
> Apparently, not being able to reverse-resolve the IP address of the nodes was 
> the culprit of the 10s delay.
> 
> We've hit two other secondary problems with TorrentBroadcast though, in case 
> you're interested  :
> 
> 1 - Broadcasting a variable of about 2GB (1.8GB exactly) triggers a 
> "java.lang.OutOfMemoryError: Requested array size exceeds VM limit", which is 
> not the case with HttpBroadcast (I guess HttpBroadcast splits the serialized 
> variable in small chunks)
> 2 - Memory use of Torrent seems to be higher than Http (i.e. switching from 
> Http to Torrent triggers several OOM).
> 
> Additionally, a question : while HttpBroadcast stores the broadcast pieces on 
> disk (in spark.local.dir/spark-... ), TorrentBroadcast seems not to use disk 
> backend storage. Does it mean that HttpBroadcast can handle bigger broadcast 
> out of memory ? If so, it's too bad that this design choice wasn't used for 
> Torrent.
> 
> That being said, hats off to the people in charge of the broadcast unloading 
> wrt the lineage, this stuff works great !
> 
> Guillaume
> 
> 
>> Maybe there is a firewall issue that makes it slow for your nodes to connect 
>> through the IP addresses they're configured with. I see there's this 10 
>> second pause between "Updated info of block broadcast_84_piece1" and 
>> "ensureFreeSpace(4194304) called" (where it actually receives the block). 
>> HTTP broadcast used only HTTP fetches from the executors to the driver, but 
>> TorrentBroadcast has connections between the executors themselves and 
>> between executors and the driver over a different port. Where are you 
>> running your driver app and nodes?
>> 
>> Matei
>> 
>> On Oct 7, 2014, at 11:42 AM, Davies Liu <dav...@databricks.com> wrote:
>> 
>>> Could you create a JIRA for it? maybe it's a regression after
>>> https://issues.apache.org/jira/browse/SPARK-3119.
>>> 
>>> We will appreciate that if you could tell how to reproduce it.
>>> 
>>> On Mon, Oct 6, 2014 at 1:27 AM, Guillaume Pitel
>>> <guillaume.pi...@exensa.com> wrote:
>>>> Hi,
>>>> 
>>>> I've had no answer to this on u...@spark.apache.org, so I post it on dev
>>>> before filing a JIRA (in case the problem or solution is already 
>>>> identified)
>>>> 
>>>> We've had some performance issues since switching to 1.1.0, and we finally
>>>> found the origin : TorrentBroadcast seems to be very slow in our setting
>>>> (and it became default with 1.1.0)
>>>> 
>>>> The logs of a 4MB variable with TorrentBroadcast : (15s)
>>>> 
>>>> 14/10/01 15:47:13 INFO storage.MemoryStore: Block broadcast_84_piece1 
>>>> stored
>>>> as bytes in memory (estimated size 171.6 KB, free 7.2 GB)
>>>> 14/10/01 15:47:13 INFO storage.BlockManagerMaster: Updated info of block
>>>> broadcast_84_piece1
>>>> 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4194304) called
>>>> with curMem=1401611984, maxMem=9168696115
>>>> 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84_piece0 
>>>> stored
>>>> as bytes in memory (estimated size 4.0 MB, free 7.2 GB)
>>>> 14/10/01 15:47:23 INFO storage.BlockManagerMaster: Updated info of block
>>>> broadcast_84_piece0
>>>> 14/10/01 15:47:23 INFO broadcast.TorrentBroadcast: Reading broadcast
>>>> variable 84 took 15.202260006 s
>>>> 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4371392) called
>>>> with curMem=1405806288, maxMem=9168696115
>>>> 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84 stored as
>>>> values in memory (estimated size 4.2 MB, free 7.2 GB)
>>>> 
>>>> (notice that a 10s lag happens after the "Updated info of block
>>>> broadcast_..." and before the MemoryStore log
>>>> 
>>>> And with HttpBroadcast (0.3s):
>>>> 
>>>> 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Started reading broadcast
>>>> variable 147
>>>> 14/10/01 16:05:58 INFO storage.MemoryStore: ensureFreeSpace(4369376) called
>>>> with curMem=1373493232, maxMem=9168696115
>>>> 14/10/01 16:05:58 INFO storage.MemoryStore: Block broadcast_147 stored as
>>>> values in memory (estimated size 4.2 MB, free 7.3 GB)
>>>> 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Reading broadcast variable
>>>> 147 took 0.320907112 s 14/10/01 16:05:58 INFO storage.BlockManager: Found
>>>> block broadcast_147 locally
>>>> 
>>>> Since Torrent is supposed to perform much better than Http, we suspect a
>>>> configuration error from our side, but are unable to pin it down. Does
>>>> someone have any idea of the origin of the problem ?
>>>> 
>>>> For now we're sticking with the HttpBroadcast workaround.
>>>> 
>>>> Guillaume
>>>> --
>>>> Guillaume PITEL, Président
>>>> +33(0)626 222 431
>>>> 
>>>> eXenSa S.A.S.
>>>> 41, rue Périer - 92120 Montrouge - FRANCE
>>>> Tel +33(0)184 163 677 / Fax +33(0)972 283 705
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 

Reply via email to