Hi Mark,

I’m not entirely sure what algorithm is used underneath, but this is obviously 
an issue. I’ll file a bug for this.

As a work-around you might use a transform, and override the uri with something 
like map:put($content, “uri”, concat(“/mypath/“, xdmp:random(), “.xml”)). 
xdmp:random() by default generates a 64-bit size random number, that should be 
big enough to go well over 1 mln. If you are paranoid you could use xdmp:exist 
to check if a doc with that id already exists.

If you are not yet using transforms, that might add a bit of extra overhead..

Kind regards,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Mark Shanks 
<[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Monday, September 28, 2015 at 8:58 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [MarkLogic Dev General] Using -generate_uri with mlcp and lots of 
documents

Hi,

I'm using the -generate_uri switch with the marklogic content pump as the 
documents I have don't have any unique id's contained within them. However, 
I've found a big problem in that, if I use mlcp with more than a million 
documents, the uri's that are generated no longer become unique and the 
documents are overwritten - leading to a maximum of 1 million documents that 
you can ingest in this way.

The problem is easy to see. -generate_uri creates a uri like -0-308950, varying 
the last 6 digits, so there are a maximum of a million combinations. 
-generate_uri doesn't seem to change the -0, or be smart enough to increase the 
number of digits when the maximum is hit, it just starts to overwrite existing 
documents.

This seems to be a very flawed approach and an unworkable solution. Am I 
missing something? How does one generate over 1 million random unique uri's 
using mlcp?

Thanks.
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to