Good catch. In that case, using BitTornado/murder would be better.
-- Mosharaf Chowdhury http://www.mosharaf.com/ On Mon, May 19, 2014 at 11:17 AM, Aaron Davidson <ilike...@gmail.com> wrote: > On the ec2 machines, you can update the slaves from the master using > something like "~/spark-ec2/copy-dir ~/spark". > > Spark's TorrentBroadcast relies on the Block Manager to distribute blocks, > making it relatively hard to extract. > > > On Mon, May 19, 2014 at 12:36 AM, Daniel Mahler <dmah...@gmail.com> wrote: > >> btw is there a command or script to update the slaves from the master? >> >> thanks >> Daniel >> >> >> On Mon, May 19, 2014 at 1:48 AM, Andrew Ash <and...@andrewash.com> wrote: >> >>> If the codebase for Spark's broadcast is pretty self-contained, you >>> could consider creating a small bootstrap sent out via the doubling rsync >>> strategy that Mosharaf outlined above (called "Tree D=2" in the paper) that >>> then pulled the larger >>> >>> Mosharaf, do you have a sense of whether the gains from using Cornet vs >>> Tree D=2 with rsync outweighs the overhead of using a 2-phase broadcast >>> mechanism? >>> >>> Andrew >>> >>> >>> On Sun, May 18, 2014 at 11:32 PM, Aaron Davidson <ilike...@gmail.com>wrote: >>> >>>> One issue with using Spark itself is that this rsync is required to get >>>> Spark to work... >>>> >>>> Also note that a similar strategy is used for *updating* the spark >>>> cluster on ec2, where the "diff" aspect is much more important, as you >>>> might only make a small change on the driver node (recompile or >>>> reconfigure) and can get a fast sync. >>>> >>>> >>>> On Sun, May 18, 2014 at 11:22 PM, Mosharaf Chowdhury < >>>> mosharafka...@gmail.com> wrote: >>>> >>>>> What twitter calls murder, unless it has changed since then, is just a >>>>> BitTornado wrapper. In 2011, We did some comparison on the performance of >>>>> murder and the TorrentBroadcast we have right now for Spark's own >>>>> broadcast >>>>> (Section 7.1 in >>>>> http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf). >>>>> Spark's implementation was 4.5X faster than murder. >>>>> >>>>> The only issue with using TorrentBroadcast to deploy code/VM is >>>>> writing a wrapper around it to read from disk, but it shouldn't be too >>>>> complicated. If someone picks it up, I can give some pointers on how to >>>>> proceed (I've thought about doing it myself forever, but never ended up >>>>> actually taking the time; right now I don't have enough free cycles >>>>> either) >>>>> >>>>> Otherwise, murder/BitTornado would be better than the current strategy >>>>> we have. >>>>> >>>>> A third option would be to use rsync; but instead of rsync-ing to >>>>> every slave from the master, one can simply rsync from the master first to >>>>> one slave; then use the two sources (master and the first slave) to rsync >>>>> to two more; then four and so on. Might be a simpler solution without many >>>>> changes. >>>>> >>>>> -- >>>>> Mosharaf Chowdhury >>>>> http://www.mosharaf.com/ >>>>> >>>>> >>>>> On Sun, May 18, 2014 at 11:07 PM, Andrew Ash <and...@andrewash.com>wrote: >>>>> >>>>>> My first thought would be to use libtorrent for this setup, and it >>>>>> turns out that both Twitter and Facebook do code deploys with a >>>>>> bittorrent >>>>>> setup. Twitter even released their code as open source: >>>>>> >>>>>> >>>>>> https://blog.twitter.com/2010/murder-fast-datacenter-code-deploys-using-bittorrent >>>>>> >>>>>> >>>>>> http://arstechnica.com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/ >>>>>> >>>>>> >>>>>> On Sun, May 18, 2014 at 10:44 PM, Daniel Mahler <dmah...@gmail.com>wrote: >>>>>> >>>>>>> I am not an expert in this space either. I thought the initial rsync >>>>>>> during launch is really just a straight copy that did not need the tree >>>>>>> diff. So it seemed like having the slaves do the copying among it each >>>>>>> other would be better than having the master copy to everyone directly. >>>>>>> That made me think of bittorrent, though there may well be other systems >>>>>>> that do this. >>>>>>> From the launches I did today it seems that it is taking around 1 >>>>>>> minute per slave to launch a cluster, which can be a problem for >>>>>>> clusters >>>>>>> with 10s or 100s of slaves, particularly since on ec2 that time has to >>>>>>> be >>>>>>> paid for. >>>>>>> >>>>>>> >>>>>>> On Sun, May 18, 2014 at 11:54 PM, Aaron Davidson <ilike...@gmail.com >>>>>>> > wrote: >>>>>>> >>>>>>>> Out of curiosity, do you have a library in mind that would make it >>>>>>>> easy to setup a bit torrent network and distribute files in an rsync >>>>>>>> (i.e., >>>>>>>> apply a diff to a tree, ideally) fashion? I'm not familiar with this >>>>>>>> space, >>>>>>>> but we do want to minimize the complexity of our standard ec2 launch >>>>>>>> scripts to reduce the chance of something breaking. >>>>>>>> >>>>>>>> >>>>>>>> On Sun, May 18, 2014 at 9:22 PM, Daniel Mahler >>>>>>>> <dmah...@gmail.com>wrote: >>>>>>>> >>>>>>>>> I am launching a rather large cluster on ec2. >>>>>>>>> It seems like the launch is taking forever on >>>>>>>>> .... >>>>>>>>> Setting up spark >>>>>>>>> RSYNC'ing /root/spark to slaves... >>>>>>>>> ... >>>>>>>>> >>>>>>>>> It seems that bittorrent might be a faster way to replicate >>>>>>>>> the sizeable spark directory to the slaves >>>>>>>>> particularly if there is a lot of not very powerful slaves. >>>>>>>>> >>>>>>>>> Just a thought ... >>>>>>>>> >>>>>>>>> cheers >>>>>>>>> Daniel >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >