Hi Michael, Thanks for following up on this. My database currently has approximately 8M documents taking up over 460BG of space split across 3 forests on 3 separate 500 GB devices. I retain documents at various phases of production processed by more than 1 CPF domain (input, intermediate which takes a while to produce, and final formats). The plan is to go to a single 1.5T local device in a RAID 5 configuration. I was considering just copying the forests over to the new device to simplify the content migration, but my numbers don't work with your recommendations for performance. Sounds like I should only have one or two destination forests, but without re-architecting my dataflow I way exceed the max number of GBs per database. At what point does performance degradation become noticeable?
Thank you! Tim Meagher -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Blakeley Sent: Sunday, July 15, 2012 2:15 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Migrating content to another database I know I am late with this reply, but I wanted to plug https://github.com/mblakele/task-rebalancer for this job. The downside is that it requires 5.0, or some patches for 4.2. But the 5.0 upgrade is very easy: if you are already planning on a move to 4.2, consider going to 5.0 instead. You can defer reindexing until you are ready. Assuming you can use it, the task-rebalancer should be faster than XQSync - as long as you allocate enough task server threads. The project also provides an example module to 'evacuate' a forest. You could modify it to evacuate your old forests, which would populate your new one(s). Along those lines, I wouldn't let a single forest grow without bounds and multiple devices are good for performance. Generally speaking I try to make sure the database has: * 1 forest per 2 CPU cores * no more than 200-GB each * no more than 32M documents each * no more than 2 forests per filesystem * no more than 1 forest per spindle Some of these rules depend on the situation, too. With positions enabled, for example, I would try for something closer to 8M documents or 100-GB, whichever comes first. With RAID-1 or RAID-10 I would count drive-pairs as spindles. With RAID-5 or RAID-6 I would count RAID groups as spindles. -- Mike On 10 Jul 2012, at 19:15 , Tim Meagher wrote: > Hi Folks, > > I have over 150 GB of content in one database that is currently spread > unevenly across 3 forests on 3 separate devices. I need to migrate > this content to a new database which uses one device with more than > enough space for all of the content. Since there is only one device, > I'm wondering if there is any advantage or disadvantage to using > multiple forests. I think I should be able to simply copy the content > by creating 3 forests in the new database and copying the forests > over, but I'd like to know if this is not an optimal solution in which > case I will need to be a little more resourceful about copying the content over. Perhaps xqsync? > > Tim Meagher > > P.S. Using MOvign content from ML 4.1 to ML 4.2. Sorry, not yet at 5! > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
