Hi Mark, I’m not entirely sure what algorithm is used underneath, but this is obviously an issue. I’ll file a bug for this.
As a work-around you might use a transform, and override the uri with something like map:put($content, “uri”, concat(“/mypath/“, xdmp:random(), “.xml”)). xdmp:random() by default generates a 64-bit size random number, that should be big enough to go well over 1 mln. If you are paranoid you could use xdmp:exist to check if a doc with that id already exists. If you are not yet using transforms, that might add a bit of extra overhead.. Kind regards, Geert From: <[email protected]<mailto:[email protected]>> on behalf of Mark Shanks <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Monday, September 28, 2015 at 8:58 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [MarkLogic Dev General] Using -generate_uri with mlcp and lots of documents Hi, I'm using the -generate_uri switch with the marklogic content pump as the documents I have don't have any unique id's contained within them. However, I've found a big problem in that, if I use mlcp with more than a million documents, the uri's that are generated no longer become unique and the documents are overwritten - leading to a maximum of 1 million documents that you can ingest in this way. The problem is easy to see. -generate_uri creates a uri like -0-308950, varying the last 6 digits, so there are a maximum of a million combinations. -generate_uri doesn't seem to change the -0, or be smart enough to increase the number of digits when the maximum is hit, it just starts to overwrite existing documents. This seems to be a very flawed approach and an unworkable solution. Am I missing something? How does one generate over 1 million random unique uri's using mlcp? Thanks.
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
