Nice thanks ! On Thu, Feb 25, 2016 at 1:51 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
> For what it is worth, I finally wrote a blog post about this --> > http://thelastpickle.com/blog/2016/02/25/removing-a-disk-mapping-from-cassandra.html > > If you are not done yet, every step is detailed in there. > > C*heers, > ----------------------- > Alain Rodriguez - al...@thelastpickle.com > France > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > 2016-02-19 10:04 GMT+01:00 Alain RODRIGUEZ <arodr...@gmail.com>: > >> Alain, thanks for sharing! I'm confused why you do so many repetitive >>> rsyncs. Just being cautious or is there another reason? Also, why do you >>> have --delete-before when you're copying data to a temp (assumed empty) >>> directory? >> >> >> Since they are immutable I do a first sync while everything is up and >>> running to the new location which runs really long. Meanwhile new ones are >>> created and I sync them again online, much less files to copy now. After >>> that I shutdown the node and my last rsync now has to copy only a few files >>> which is quite fast and so the downtime for that node is within minutes. >> >> >> Jan guess is right. Except for the "immutable" thing. Compaction can >> make big files go away, replaced by bigger ones you'll have to stream again. >> >> Here is a detailed explanation about what I did it this way. >> >> More precisely, let's say we have 10 files of 100 GB on the disk to >> remove (let's say 'old-dir') >> >> I run a first rsync to an empty folder indeed (let's call this >> 'tmp-dir'), in the disk that will remain after the operation. Let's say >> this takes about 10 hours. This can be run in parallel though. >> >> So I now have 10 files of 10GB on the tmp-dir. But meanwhile one >> compaction triggered and I now have 6 files of 100 GB and 1 of 350 GB. >> >> At this point I disable compaction, stop running ones. >> >> My second rsync has to remove the 4 files that were compacted from >> tmp-dir, so that's why I use the '--delete-before'. As this tmp-dir >> needs to be mirroring old-dir, this is fine. This new operation takes 3.5 >> hours, also runnable in parallel (Keep in mind C* won't compact anything >> for 3.5 hours, that's why I did not stopped compaction before the first >> rsync, in my case dataset was 2 TB big) >> >> At this point I have 950 GB in tmp-dir, but meanwhile clients continued >> to write on the disk. let's say 50 GB more. >> >> 3rd rsync will take 0.5 hour, no compaction ran, so I just have to add >> the diff to tmp-dir. Still runnable in parallel. >> >> Then the script stop the node, so should be run sequentially, and perform >> 2 more rsync, the first one to take the diff between end of 3rd rsync and >> the moment you stop the node, should be a few seconds, minutes maybe, >> depending how fast you ran the script after 3rd rsync ended. The second >> rsync in the script is a 'useless' one. I just like to control things. I >> run it, expect to see it to say that there is no diff. It is just a way to >> stop the script if for some reason data is still being appended to old-dir. >> >> Then I just move all the files from tmp-dir to new-dir (the proper data >> dir remaining after the operation). This is an instant op a files are not >> really moved as they already are on disk. That's due to system files >> property. >> >> I finally unmount and rm -rf old-dir. >> >> So the full op takes 10h + 3.5 h + 0.5h + (number of noodes * 0.1 h) and >> nodes are down for about 5-10 min. >> >> VS >> >> Straight forward way (stop node, move, start node) : 10 h * number of >> node as this needs to be sequential. Plus each node is down for 10 hours, >> you have to repair them as it is higher than hinted handoff limit... >> >> Branton, I did not went through your process, but I guess you will be >> able to review it by yourself after reading the above (typically, repair is >> not needed if you use the strategy I describe above, as node is down for >> 5-10 minutes). Also, not sure how "rsync -azvuiP >> /var/data/cassandra/data2/ /var/data/cassandra/data/" will behave, my guess >> i this is going to do a copy, so this might be very long. My script perform >> an instant move and as the next command is 'rm -Rf >> /var/data/cassandra/data2' I see no reason copying rather than moving files. >> >> Your solution would probably work, but with big constraints on >> operational point of view (very long operation + repair needed) >> >> Hope this long email will be useful, maybe should I blog about this. Let >> me know if the process above makes sense or if some things might be >> improved. >> >> C*heers, >> ----------------- >> Alain Rodriguez >> France >> >> The Last Pickle >> http://www.thelastpickle.com >> >> 2016-02-19 7:19 GMT+01:00 Branton Davis <branton.da...@spanning.com>: >> >>> Jan, thanks! That makes perfect sense to run a second time before >>> stopping cassandra. I'll add that in when I do the production cluster. >>> >>> On Fri, Feb 19, 2016 at 12:16 AM, Jan Kesten <j.kes...@enercast.de> >>> wrote: >>> >>>> Hi Branton, >>>> >>>> two cents from me - I didnt look through the script, but for the rsyncs >>>> I do pretty much the same when moving them. Since they are immutable I do a >>>> first sync while everything is up and running to the new location which >>>> runs really long. Meanwhile new ones are created and I sync them again >>>> online, much less files to copy now. After that I shutdown the node and my >>>> last rsync now has to copy only a few files which is quite fast and so the >>>> downtime for that node is within minutes. >>>> >>>> Jan >>>> >>>> >>>> >>>> Von meinem iPhone gesendet >>>> >>>> Am 18.02.2016 um 22:12 schrieb Branton Davis < >>>> branton.da...@spanning.com>: >>>> >>>> Alain, thanks for sharing! I'm confused why you do so many repetitive >>>> rsyncs. Just being cautious or is there another reason? Also, why do you >>>> have --delete-before when you're copying data to a temp (assumed empty) >>>> directory? >>>> >>>> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ <arodr...@gmail.com> >>>> wrote: >>>> >>>>> I did the process a few weeks ago and ended up writing a runbook and a >>>>> script. I have anonymised and share it fwiw. >>>>> >>>>> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk >>>>> >>>>> It is basic bash. I tried to have the shortest down time possible, >>>>> making this a bit more complex, but it allows you to do a lot in parallel >>>>> and just do a fast operation sequentially, reducing overall operation >>>>> time. >>>>> >>>>> This worked fine for me, yet I might have make some errors while >>>>> making it configurable though variables. Be sure to be around if you >>>>> decide >>>>> to run this. Also I automated this more by using knife (Chef), I hate to >>>>> repeat ops, this is something you might want to consider. >>>>> >>>>> Hope this is useful, >>>>> >>>>> C*heers, >>>>> ----------------- >>>>> Alain Rodriguez >>>>> France >>>>> >>>>> The Last Pickle >>>>> http://www.thelastpickle.com >>>>> >>>>> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal <anis...@gmail.com>: >>>>> >>>>>> Hey Branton, >>>>>> >>>>>> Please do let us know if you face any problems doing this. >>>>>> >>>>>> Thanks >>>>>> anishek >>>>>> >>>>>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis < >>>>>> branton.da...@spanning.com> wrote: >>>>>> >>>>>>> We're about to do the same thing. It shouldn't be necessary to shut >>>>>>> down the entire cluster, right? >>>>>>> >>>>>>> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli <rc...@eventbrite.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal < >>>>>>>> anis...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> To accomplish this can I just copy the data from disk1 to disk2 >>>>>>>>> with in the relevant cassandra home location folders, change the >>>>>>>>> cassanda.yaml configuration and restart the node. before starting i >>>>>>>>> will >>>>>>>>> shutdown the cluster. >>>>>>>>> >>>>>>>> >>>>>>>> Yes. >>>>>>>> >>>>>>>> =Rob >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >