Hi Mike,

When you say that CPF does quite a few journal writes, does that take place
in the associated triggers database, i.e., should the triggers database be
carefully architected as well?  

Tim

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Michael
Blakeley
Sent: Sunday, July 15, 2012 9:38 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Migrating content to another database

All else equal, RAID-5 is likely to be the biggest problem. RAID-5/6 are
pretty bad for write performance, which will impact journal writes and large
merges. CPF does quite a few journal writes, and merges just happen.

Here's a test you might try: back up your old forests to the new device, and
then restore them to a set of newly-created forests on the same device.
Force a merge on all three, and see how long it takes to finish. You can
compare with the old devices, which will give you a good idea of just how
much your new device configuration would impact performance. All things may
not be equal, so it may be an improvement despite my knee-jerk reaction.

-- Mike

On 15 Jul 2012, at 11:51 , Tim Meagher wrote:

> Hi Michael,
> 
> Thanks for following up on this.  My database currently has 
> approximately 8M documents taking up over 460BG of space split across 
> 3 forests on 3 separate
> 500 GB devices.  I retain documents at various phases of production 
> processed by more than 1 CPF domain (input, intermediate which takes a 
> while to produce, and final formats).  The plan is to go to a single 
> 1.5T local device in a RAID 5 configuration.  I was considering just 
> copying the forests over to the new device to simplify the content 
> migration, but my numbers don't work with your recommendations for 
> performance.  Sounds like I should only have one or two destination 
> forests, but without re-architecting my dataflow I way exceed the max 
> number of GBs per database.  At what point does performance degradation
become noticeable?
> 
> Thank you!
> 
> Tim Meagher
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Michael 
> Blakeley
> Sent: Sunday, July 15, 2012 2:15 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Migrating content to another 
> database
> 
> I know I am late with this reply, but I wanted to plug 
> https://github.com/mblakele/task-rebalancer for this job. The downside 
> is that it requires 5.0, or some patches for 4.2. But the 5.0 upgrade 
> is very
> easy: if you are already planning on a move to 4.2, consider going to 
> 5.0 instead. You can defer reindexing until you are ready.
> 
> Assuming you can use it, the task-rebalancer should be faster than 
> XQSync - as long as you allocate enough task server threads. The 
> project also provides an example module to 'evacuate' a forest. You 
> could modify it to evacuate your old forests, which would populate your
new one(s).
> 
> Along those lines, I wouldn't let a single forest grow without bounds 
> and multiple devices are good for performance. Generally speaking I 
> try to make sure the database has:
> 
> * 1 forest per 2 CPU cores
> * no more than 200-GB each
> * no more than 32M documents each
> * no more than 2 forests per filesystem
> * no more than 1 forest per spindle
> 
> Some of these rules depend on the situation, too. With positions 
> enabled, for example, I would try for something closer to 8M documents 
> or 100-GB, whichever comes first. With RAID-1 or RAID-10 I would count 
> drive-pairs as spindles. With RAID-5 or RAID-6 I would count RAID groups
as spindles.
> 
> -- Mike
> 
> On 10 Jul 2012, at 19:15 , Tim Meagher wrote:
> 
>> Hi Folks,
>> 
>> I have over 150 GB of content in one database that is currently 
>> spread unevenly across 3 forests on 3 separate devices.  I need to 
>> migrate this content to a new database which uses one device with 
>> more than enough space for all of the content.  Since there is only 
>> one device, I'm wondering if there is any advantage or disadvantage 
>> to using multiple forests.  I think I should be able to simply copy 
>> the content by creating 3 forests in the new database and copying the 
>> forests over, but I'd like to know if this is not an optimal solution 
>> in which case I will need to be a little more resourceful about 
>> copying the content
> over.  Perhaps xqsync?
>> 
>> Tim Meagher
>> 
>> P.S. Using MOvign content from ML 4.1 to ML 4.2.  Sorry, not yet at 5!
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to