Hello Constantine,

your query very much looks like you want to group by each <pii/>
element. Using XQuery 3.0 there even is a construct aimed at that and I
would guess it should also preserve memory. It is called group by and
you can find some more information at
https://docs.basex.org/wiki/XQuery_3.0#Group_By or in the spec.
It will look something like this:

for $x in /dataset/item/pii
let $val := $x/string()
where count($x) > 1
return <duplicate>{$val}</duplicate>

By the way, it looks you are using null like a NULL statement in other
languages. Such a statement does not exist in XQuery. You might want to
return an empty set () instead.

Cheers,
Dirk


On 09/04/14 13:37, Hondros, Constantine (ELS-AMS) wrote:
> I'm running out of memory (1.5 GB allocated) when querying for duplicate node 
> values over a fairly flat XML database of approximately 450 MB.
> 
> Can anyone suggest a more memory-efficient approach to framing this query 
> than iterating over distinct-values as I do below?  I'm hoping that there are 
> some Basex tips and tricks to help out here.
> 
> for $val in distinct-values(/dataset/item/pii)
> let $cnt := count(/dataset/item/pii[. = $val])
> return
>   if ($cnt > 1) then
>       <duplicate>{$val}</duplicate>
>   else
>     null
> 
> 
> Thanks in advance,
> Constantine
> 
> 
> ________________________________
> 
> Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
> Netherlands, Registration No. 33156677, Registered in The Netherlands.
> 

-- 
Dirk Kirsten, BaseX GmbH, http://basex.org
|-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
|-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
|   Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
`-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22

Reply via email to