Hi Rahul,

When you say “Install CPF”, there’s potentially a lot there.

When you installed Content Processing for your database, did you enable all the 
pipelines by default?  (this is the “enable conversion” as true option ) If so, 
then if you insert the docs into the CPF domain (which is most likely defined 
as “/”, and you appear to be inserting all docs prefixed with “/xml#”),  then 
all those pipelines will do a condition check potentially and a following 
possible action on the inserted docs. So there will be overhead.

A couple of other potential tests:


1)      Disable all the pipelines for the domain except “status change handling”

a.       Never disable “status change handling”, unless you really know what 
you’re doing in CPF.  You can probably disable this too, but if you do go back 
and enable any of the pipelines, you’ll want to re-enable this as well.

2)      Insert the docs WITHOUT the  “/” prefix in the URI. So CPF should not 
be triggered at all when the docs are inserted.

a.       CPF required the URI start with at least a leading “/” to trigger any 
domain you define for CPF. So inserting as xml#.xml (instead of /xml#.xml) , 
won’t insert it into any CPF domain for processing. Overhead should be minimal.

Maybe try those  see what performance you get.

But I’m not sure what you’re trying to accomplish here.  You don’t enable CPF 
unless you really need it for something.  And we don’t necessarily recommend 
inserting 20,000 docs as a single transaction.  I understand these are 
synthetic docs for testing, but you’d likely use Content Pump which has flags 
that allow you to do things such as set the transaction size for insert.

Hope this is useful,
Pete

From: [email protected] 
[mailto:[email protected]] On Behalf Of David Ennis
Sent: Thursday, November 27, 2014 8:42 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Marklogic times out in insertion of 20000 
documents for Scenario2

HI.

Regarding number 4 in scenario 2: It is my understanding that installing CPF - 
regardless of what pipelines are configured - causes overhead. If nothing else, 
there are about 6 triggers installed (which a subset get run on the insert).

Kind Regards,
David Ennis



Kind Regards,
David Ennis


David Ennis
Content Engineer

[Description: Image removed by sender. HintTech] <http://www.hinttech.com/>
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80

[Description: Image removed by sender. 
http://www.hinttech.com]<http://www.hinttech.com/> [Description: Image removed 
by sender.] <https://twitter.com/HintTech>  [Description: Image removed by 
sender.] <http://www.facebook.com/HintTech>  [Description: Image removed by 
sender.] <http://www.linkedin.com/company/HintTech>

On 27 November 2014 at 12:28, Rahul Gupta 
<[email protected]<mailto:[email protected]>> wrote:
Can you please let me know why Marklogic times out in mentioned Scenario2 
whereas it quickly performs Scenario1?

Scenario1:

1)      Create a new database.

2)      Insert 20000 documents in this database through QConsole using the 
following code.
for $i in (1 to 20000)
let $uri := fn:concat(“/xml”, $i, “.xml”)
let $document := element{fn:concat(“cpf_”, $i)} {$i}
return
xdmp:document-insert($uri, $document, xdmp:default-permissions(), “collections”)

3)      It takes time 4-9 seconds on ML 7.0-4.1 for DUAl CORE Processor with 1 
forest attached only.

Scenario2:

1)      Install Cpf over this database and don’t mention any action on initial 
state. Rather give some action on any user-defined state.

2)      Run the same code again.

3)      All the documents inserted will go to initial state without any 
invoking of any action.

4)      My understanding says installing cpf without any action being performed 
on initial state should give us same performance as Scenario 1 which is not the 
case.

5)      It is very long time taking query which even times out. Tested with 
12000 and it takes 19 minutes.


Thanks,
Rahul Gupta


_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to