[basex-talk] Efficient query for duplicates

Hondros, Constantine (ELS-AMS) Wed, 09 Apr 2014 04:40:31 -0700

I'm running out of memory (1.5 GB allocated) when querying for duplicate node 
values over a fairly flat XML database of approximately 450 MB.


Can anyone suggest a more memory-efficient approach to framing this query than 
iterating over distinct-values as I do below?  I'm hoping that there are some 
Basex tips and tricks to help out here.

for $val in distinct-values(/dataset/item/pii)
let $cnt := count(/dataset/item/pii[. = $val])
return
  if ($cnt > 1) then
      <duplicate>{$val}</duplicate>
  else
    null


Thanks in advance,
Constantine


________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.

[basex-talk] Efficient query for duplicates

Reply via email to