Hi Michael,

Thanks for following up on this.  My database currently has approximately 8M
documents taking up over 460BG of space split across 3 forests on 3 separate
500 GB devices.  I retain documents at various phases of production
processed by more than 1 CPF domain (input, intermediate which takes a while
to produce, and final formats).  The plan is to go to a single 1.5T local
device in a RAID 5 configuration.  I was considering just copying the
forests over to the new device to simplify the content migration, but my
numbers don't work with your recommendations for performance.  Sounds like I
should only have one or two destination forests, but without re-architecting
my dataflow I way exceed the max number of GBs per database.  At what point
does performance degradation become noticeable?

Thank you!

Tim Meagher

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Michael
Blakeley
Sent: Sunday, July 15, 2012 2:15 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Migrating content to another database

I know I am late with this reply, but I wanted to plug
https://github.com/mblakele/task-rebalancer for this job. The downside is
that it requires 5.0, or some patches for 4.2. But the 5.0 upgrade is very
easy: if you are already planning on a move to 4.2, consider going to 5.0
instead. You can defer reindexing until you are ready.

Assuming you can use it, the task-rebalancer should be faster than XQSync -
as long as you allocate enough task server threads. The project also
provides an example module to 'evacuate' a forest. You could modify it to
evacuate your old forests, which would populate your new one(s).

Along those lines, I wouldn't let a single forest grow without bounds and
multiple devices are good for performance. Generally speaking I try to make
sure the database has:

* 1 forest per 2 CPU cores
* no more than 200-GB each
* no more than 32M documents each
* no more than 2 forests per filesystem
* no more than 1 forest per spindle

Some of these rules depend on the situation, too. With positions enabled,
for example, I would try for something closer to 8M documents or 100-GB,
whichever comes first. With RAID-1 or RAID-10 I would count drive-pairs as
spindles. With RAID-5 or RAID-6 I would count RAID groups as spindles.

-- Mike

On 10 Jul 2012, at 19:15 , Tim Meagher wrote:

> Hi Folks,
> 
> I have over 150 GB of content in one database that is currently spread 
> unevenly across 3 forests on 3 separate devices.  I need to migrate 
> this content to a new database which uses one device with more than 
> enough space for all of the content.  Since there is only one device, 
> I'm wondering if there is any advantage or disadvantage to using 
> multiple forests.  I think I should be able to simply copy the content 
> by creating 3 forests in the new database and copying the forests 
> over, but I'd like to know if this is not an optimal solution in which 
> case I will need to be a little more resourceful about copying the content
over.  Perhaps xqsync?
> 
> Tim Meagher
> 
> P.S. Using MOvign content from ML 4.1 to ML 4.2.  Sorry, not yet at 5!
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to