Hi,

  I have a few millions entities in a collection, and would like to
find duplicates in it.  There are several possible root element
names (all documents do not have the same root element).

  By duplicates, I mean any document where the value of the element
/*/foobar is the same.  Foo is declared as a string in the schema.
So what I am looking for, really, is the value of all the values of
/*/foobar that appear in more than one document in that collection.

  Because there are several millions documents in the collection,
it is not possible to use the naive query that would look something
like the following:

    for $f in fn:collection('coll')/*/foobar
    where fn:exists(fn:collection('coll')[/*/foobar eq $f][2])
    return
      <dup>{ $f }</dup>

  There is a range element index configured for the element foobar.

  Any idea how I can optimize the query (so it actually completes)?

  Regards,

-- 
Florent Georges
http://fgeorges.org/
http://h2oconsulting.be/
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to