Re: sync master with slaves with bittorrent?

Mosharaf Chowdhury Mon, 19 May 2014 11:27:17 -0700

Good catch. In that case, using BitTornado/murder would be better.


--
Mosharaf Chowdhury
http://www.mosharaf.com/


On Mon, May 19, 2014 at 11:17 AM, Aaron Davidson <ilike...@gmail.com> wrote:

> On the ec2 machines, you can update the slaves from the master using
> something like "~/spark-ec2/copy-dir ~/spark".
>
> Spark's TorrentBroadcast relies on the Block Manager to distribute blocks,
> making it relatively hard to extract.
>
>
> On Mon, May 19, 2014 at 12:36 AM, Daniel Mahler <dmah...@gmail.com> wrote:
>
>> btw is there a command or script to update the slaves from the master?
>>
>> thanks
>> Daniel
>>
>>
>> On Mon, May 19, 2014 at 1:48 AM, Andrew Ash <and...@andrewash.com> wrote:
>>
>>> If the codebase for Spark's broadcast is pretty self-contained, you
>>> could consider creating a small bootstrap sent out via the doubling rsync
>>> strategy that Mosharaf outlined above (called "Tree D=2" in the paper) that
>>> then pulled the larger
>>>
>>> Mosharaf, do you have a sense of whether the gains from using Cornet vs
>>> Tree D=2 with rsync outweighs the overhead of using a 2-phase broadcast
>>> mechanism?
>>>
>>> Andrew
>>>
>>>
>>> On Sun, May 18, 2014 at 11:32 PM, Aaron Davidson <ilike...@gmail.com>wrote:
>>>
>>>> One issue with using Spark itself is that this rsync is required to get
>>>> Spark to work...
>>>>
>>>> Also note that a similar strategy is used for *updating* the spark
>>>> cluster on ec2, where the "diff" aspect is much more important, as you
>>>> might only make a small change on the driver node (recompile or
>>>> reconfigure) and can get a fast sync.
>>>>
>>>>
>>>> On Sun, May 18, 2014 at 11:22 PM, Mosharaf Chowdhury <
>>>> mosharafka...@gmail.com> wrote:
>>>>
>>>>> What twitter calls murder, unless it has changed since then, is just a
>>>>> BitTornado wrapper. In 2011, We did some comparison on the performance of
>>>>> murder and the TorrentBroadcast we have right now for Spark's own 
>>>>> broadcast
>>>>> (Section 7.1 in
>>>>> http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf).
>>>>> Spark's implementation was 4.5X faster than murder.
>>>>>
>>>>> The only issue with using TorrentBroadcast to deploy code/VM is
>>>>> writing a wrapper around it to read from disk, but it shouldn't be too
>>>>> complicated. If someone picks it up, I can give some pointers on how to
>>>>> proceed (I've thought about doing it myself forever, but never ended up
>>>>> actually taking the time; right now I don't have enough free cycles 
>>>>> either)
>>>>>
>>>>> Otherwise, murder/BitTornado would be better than the current strategy
>>>>> we have.
>>>>>
>>>>> A third option would be to use rsync; but instead of rsync-ing to
>>>>> every slave from the master, one can simply rsync from the master first to
>>>>> one slave; then use the two sources (master and the first slave) to rsync
>>>>> to two more; then four and so on. Might be a simpler solution without many
>>>>> changes.
>>>>>
>>>>> --
>>>>> Mosharaf Chowdhury
>>>>> http://www.mosharaf.com/
>>>>>
>>>>>
>>>>> On Sun, May 18, 2014 at 11:07 PM, Andrew Ash <and...@andrewash.com>wrote:
>>>>>
>>>>>> My first thought would be to use libtorrent for this setup, and it
>>>>>> turns out that both Twitter and Facebook do code deploys with a 
>>>>>> bittorrent
>>>>>> setup.  Twitter even released their code as open source:
>>>>>>
>>>>>>
>>>>>> https://blog.twitter.com/2010/murder-fast-datacenter-code-deploys-using-bittorrent
>>>>>>
>>>>>>
>>>>>> http://arstechnica.com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/
>>>>>>
>>>>>>
>>>>>> On Sun, May 18, 2014 at 10:44 PM, Daniel Mahler <dmah...@gmail.com>wrote:
>>>>>>
>>>>>>> I am not an expert in this space either. I thought the initial rsync
>>>>>>> during launch is really just a straight copy that did not need the tree
>>>>>>> diff. So it seemed like having the slaves do the copying among it each
>>>>>>> other would be better than having the master copy to everyone directly.
>>>>>>> That made me think of bittorrent, though there may well be other systems
>>>>>>> that do this.
>>>>>>> From the launches I did today it seems that it is taking around 1
>>>>>>> minute per slave to launch a cluster, which can be a problem for 
>>>>>>> clusters
>>>>>>> with 10s or 100s of slaves, particularly since on ec2  that time has to 
>>>>>>> be
>>>>>>> paid for.
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 18, 2014 at 11:54 PM, Aaron Davidson <ilike...@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Out of curiosity, do you have a library in mind that would make it
>>>>>>>> easy to setup a bit torrent network and distribute files in an rsync 
>>>>>>>> (i.e.,
>>>>>>>> apply a diff to a tree, ideally) fashion? I'm not familiar with this 
>>>>>>>> space,
>>>>>>>> but we do want to minimize the complexity of our standard ec2 launch
>>>>>>>> scripts to reduce the chance of something breaking.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, May 18, 2014 at 9:22 PM, Daniel Mahler 
>>>>>>>> <dmah...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> I am launching a rather large cluster on ec2.
>>>>>>>>> It seems like the launch is taking forever on
>>>>>>>>> ....
>>>>>>>>> Setting up spark
>>>>>>>>> RSYNC'ing /root/spark to slaves...
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> It seems that bittorrent might be a faster way to replicate
>>>>>>>>> the sizeable spark directory to the slaves
>>>>>>>>> particularly if there is a lot of not very powerful slaves.
>>>>>>>>>
>>>>>>>>> Just a thought ...
>>>>>>>>>
>>>>>>>>> cheers
>>>>>>>>> Daniel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: sync master with slaves with bittorrent?

Reply via email to