Not normally, no.

The CPF configuration is stored in the Triggers database, and describes how CPF 
should run. By design, this changes rarely. The CPF state of each document is 
stored in document properties, which are stored in the same database as the 
documents themselves. These change several times after each update, as the 
document follows the CPF pipeline.

-- Mike

On 15 Jul 2012, at 20:02 , Tim Meagher wrote:

> Hi Mike,
> 
> When you say that CPF does quite a few journal writes, does that take place
> in the associated triggers database, i.e., should the triggers database be
> carefully architected as well?  
> 
> Tim
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Michael
> Blakeley
> Sent: Sunday, July 15, 2012 9:38 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Migrating content to another database
> 
> All else equal, RAID-5 is likely to be the biggest problem. RAID-5/6 are
> pretty bad for write performance, which will impact journal writes and large
> merges. CPF does quite a few journal writes, and merges just happen.
> 
> Here's a test you might try: back up your old forests to the new device, and
> then restore them to a set of newly-created forests on the same device.
> Force a merge on all three, and see how long it takes to finish. You can
> compare with the old devices, which will give you a good idea of just how
> much your new device configuration would impact performance. All things may
> not be equal, so it may be an improvement despite my knee-jerk reaction.
> 
> -- Mike
> 
> On 15 Jul 2012, at 11:51 , Tim Meagher wrote:
> 
>> Hi Michael,
>> 
>> Thanks for following up on this.  My database currently has 
>> approximately 8M documents taking up over 460BG of space split across 
>> 3 forests on 3 separate
>> 500 GB devices.  I retain documents at various phases of production 
>> processed by more than 1 CPF domain (input, intermediate which takes a 
>> while to produce, and final formats).  The plan is to go to a single 
>> 1.5T local device in a RAID 5 configuration.  I was considering just 
>> copying the forests over to the new device to simplify the content 
>> migration, but my numbers don't work with your recommendations for 
>> performance.  Sounds like I should only have one or two destination 
>> forests, but without re-architecting my dataflow I way exceed the max 
>> number of GBs per database.  At what point does performance degradation
> become noticeable?
>> 
>> Thank you!
>> 
>> Tim Meagher
>> 
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Michael 
>> Blakeley
>> Sent: Sunday, July 15, 2012 2:15 PM
>> To: MarkLogic Developer Discussion
>> Subject: Re: [MarkLogic Dev General] Migrating content to another 
>> database
>> 
>> I know I am late with this reply, but I wanted to plug 
>> https://github.com/mblakele/task-rebalancer for this job. The downside 
>> is that it requires 5.0, or some patches for 4.2. But the 5.0 upgrade 
>> is very
>> easy: if you are already planning on a move to 4.2, consider going to 
>> 5.0 instead. You can defer reindexing until you are ready.
>> 
>> Assuming you can use it, the task-rebalancer should be faster than 
>> XQSync - as long as you allocate enough task server threads. The 
>> project also provides an example module to 'evacuate' a forest. You 
>> could modify it to evacuate your old forests, which would populate your
> new one(s).
>> 
>> Along those lines, I wouldn't let a single forest grow without bounds 
>> and multiple devices are good for performance. Generally speaking I 
>> try to make sure the database has:
>> 
>> * 1 forest per 2 CPU cores
>> * no more than 200-GB each
>> * no more than 32M documents each
>> * no more than 2 forests per filesystem
>> * no more than 1 forest per spindle
>> 
>> Some of these rules depend on the situation, too. With positions 
>> enabled, for example, I would try for something closer to 8M documents 
>> or 100-GB, whichever comes first. With RAID-1 or RAID-10 I would count 
>> drive-pairs as spindles. With RAID-5 or RAID-6 I would count RAID groups
> as spindles.
>> 
>> -- Mike
>> 
>> On 10 Jul 2012, at 19:15 , Tim Meagher wrote:
>> 
>>> Hi Folks,
>>> 
>>> I have over 150 GB of content in one database that is currently 
>>> spread unevenly across 3 forests on 3 separate devices.  I need to 
>>> migrate this content to a new database which uses one device with 
>>> more than enough space for all of the content.  Since there is only 
>>> one device, I'm wondering if there is any advantage or disadvantage 
>>> to using multiple forests.  I think I should be able to simply copy 
>>> the content by creating 3 forests in the new database and copying the 
>>> forests over, but I'd like to know if this is not an optimal solution 
>>> in which case I will need to be a little more resourceful about 
>>> copying the content
>> over.  Perhaps xqsync?
>>> 
>>> Tim Meagher
>>> 
>>> P.S. Using MOvign content from ML 4.1 to ML 4.2.  Sorry, not yet at 5!
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to