Hi Mike, When you say that CPF does quite a few journal writes, does that take place in the associated triggers database, i.e., should the triggers database be carefully architected as well?
Tim -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Blakeley Sent: Sunday, July 15, 2012 9:38 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Migrating content to another database All else equal, RAID-5 is likely to be the biggest problem. RAID-5/6 are pretty bad for write performance, which will impact journal writes and large merges. CPF does quite a few journal writes, and merges just happen. Here's a test you might try: back up your old forests to the new device, and then restore them to a set of newly-created forests on the same device. Force a merge on all three, and see how long it takes to finish. You can compare with the old devices, which will give you a good idea of just how much your new device configuration would impact performance. All things may not be equal, so it may be an improvement despite my knee-jerk reaction. -- Mike On 15 Jul 2012, at 11:51 , Tim Meagher wrote: > Hi Michael, > > Thanks for following up on this. My database currently has > approximately 8M documents taking up over 460BG of space split across > 3 forests on 3 separate > 500 GB devices. I retain documents at various phases of production > processed by more than 1 CPF domain (input, intermediate which takes a > while to produce, and final formats). The plan is to go to a single > 1.5T local device in a RAID 5 configuration. I was considering just > copying the forests over to the new device to simplify the content > migration, but my numbers don't work with your recommendations for > performance. Sounds like I should only have one or two destination > forests, but without re-architecting my dataflow I way exceed the max > number of GBs per database. At what point does performance degradation become noticeable? > > Thank you! > > Tim Meagher > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Michael > Blakeley > Sent: Sunday, July 15, 2012 2:15 PM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Migrating content to another > database > > I know I am late with this reply, but I wanted to plug > https://github.com/mblakele/task-rebalancer for this job. The downside > is that it requires 5.0, or some patches for 4.2. But the 5.0 upgrade > is very > easy: if you are already planning on a move to 4.2, consider going to > 5.0 instead. You can defer reindexing until you are ready. > > Assuming you can use it, the task-rebalancer should be faster than > XQSync - as long as you allocate enough task server threads. The > project also provides an example module to 'evacuate' a forest. You > could modify it to evacuate your old forests, which would populate your new one(s). > > Along those lines, I wouldn't let a single forest grow without bounds > and multiple devices are good for performance. Generally speaking I > try to make sure the database has: > > * 1 forest per 2 CPU cores > * no more than 200-GB each > * no more than 32M documents each > * no more than 2 forests per filesystem > * no more than 1 forest per spindle > > Some of these rules depend on the situation, too. With positions > enabled, for example, I would try for something closer to 8M documents > or 100-GB, whichever comes first. With RAID-1 or RAID-10 I would count > drive-pairs as spindles. With RAID-5 or RAID-6 I would count RAID groups as spindles. > > -- Mike > > On 10 Jul 2012, at 19:15 , Tim Meagher wrote: > >> Hi Folks, >> >> I have over 150 GB of content in one database that is currently >> spread unevenly across 3 forests on 3 separate devices. I need to >> migrate this content to a new database which uses one device with >> more than enough space for all of the content. Since there is only >> one device, I'm wondering if there is any advantage or disadvantage >> to using multiple forests. I think I should be able to simply copy >> the content by creating 3 forests in the new database and copying the >> forests over, but I'd like to know if this is not an optimal solution >> in which case I will need to be a little more resourceful about >> copying the content > over. Perhaps xqsync? >> >> Tim Meagher >> >> P.S. Using MOvign content from ML 4.1 to ML 4.2. Sorry, not yet at 5! >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
