All else equal, RAID-5 is likely to be the biggest problem. RAID-5/6 are pretty bad for write performance, which will impact journal writes and large merges. CPF does quite a few journal writes, and merges just happen.
Here's a test you might try: back up your old forests to the new device, and then restore them to a set of newly-created forests on the same device. Force a merge on all three, and see how long it takes to finish. You can compare with the old devices, which will give you a good idea of just how much your new device configuration would impact performance. All things may not be equal, so it may be an improvement despite my knee-jerk reaction. -- Mike On 15 Jul 2012, at 11:51 , Tim Meagher wrote: > Hi Michael, > > Thanks for following up on this. My database currently has approximately 8M > documents taking up over 460BG of space split across 3 forests on 3 separate > 500 GB devices. I retain documents at various phases of production > processed by more than 1 CPF domain (input, intermediate which takes a while > to produce, and final formats). The plan is to go to a single 1.5T local > device in a RAID 5 configuration. I was considering just copying the > forests over to the new device to simplify the content migration, but my > numbers don't work with your recommendations for performance. Sounds like I > should only have one or two destination forests, but without re-architecting > my dataflow I way exceed the max number of GBs per database. At what point > does performance degradation become noticeable? > > Thank you! > > Tim Meagher > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Michael > Blakeley > Sent: Sunday, July 15, 2012 2:15 PM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Migrating content to another database > > I know I am late with this reply, but I wanted to plug > https://github.com/mblakele/task-rebalancer for this job. The downside is > that it requires 5.0, or some patches for 4.2. But the 5.0 upgrade is very > easy: if you are already planning on a move to 4.2, consider going to 5.0 > instead. You can defer reindexing until you are ready. > > Assuming you can use it, the task-rebalancer should be faster than XQSync - > as long as you allocate enough task server threads. The project also > provides an example module to 'evacuate' a forest. You could modify it to > evacuate your old forests, which would populate your new one(s). > > Along those lines, I wouldn't let a single forest grow without bounds and > multiple devices are good for performance. Generally speaking I try to make > sure the database has: > > * 1 forest per 2 CPU cores > * no more than 200-GB each > * no more than 32M documents each > * no more than 2 forests per filesystem > * no more than 1 forest per spindle > > Some of these rules depend on the situation, too. With positions enabled, > for example, I would try for something closer to 8M documents or 100-GB, > whichever comes first. With RAID-1 or RAID-10 I would count drive-pairs as > spindles. With RAID-5 or RAID-6 I would count RAID groups as spindles. > > -- Mike > > On 10 Jul 2012, at 19:15 , Tim Meagher wrote: > >> Hi Folks, >> >> I have over 150 GB of content in one database that is currently spread >> unevenly across 3 forests on 3 separate devices. I need to migrate >> this content to a new database which uses one device with more than >> enough space for all of the content. Since there is only one device, >> I'm wondering if there is any advantage or disadvantage to using >> multiple forests. I think I should be able to simply copy the content >> by creating 3 forests in the new database and copying the forests >> over, but I'd like to know if this is not an optimal solution in which >> case I will need to be a little more resourceful about copying the content > over. Perhaps xqsync? >> >> Tim Meagher >> >> P.S. Using MOvign content from ML 4.1 to ML 4.2. Sorry, not yet at 5! >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
