Thanks guys, using group by makes this query run beautifully.

Just out of interest, if I was using a pre-XQuery 3.0 processor, would there 
have been any other approach than iterating over distinct-values?

Thanks also for the pointer about using 'null'.  The Basex query editor seemed 
to approve it when I tried it once, and the habit stuck ;-)

Cheers,
Constantine

-----Original Message-----
From: Christian Grün [mailto:[email protected]]
Sent: 09 April 2014 14:51
To: Dirk Kirsten
Cc: Hondros, Constantine (ELS-AMS); [email protected]
Subject: Re: [basex-talk] Efficient query for duplicates

Thanks Dirk. A little addition: 'let' should probably be replaced with 'group':

  for $x in /dataset/item/pii
  group by $val := $x/string()
  where count($x) > 1
  return <duplicate>{ $val }</duplicate>

Christian


On Wed, Apr 9, 2014 at 1:59 PM, Dirk Kirsten <[email protected]> wrote:
> Hello Constantine,
>
> your query very much looks like you want to group by each <pii/>
> element. Using XQuery 3.0 there even is a construct aimed at that and
> I would guess it should also preserve memory. It is called group by
> and you can find some more information at
> https://docs.basex.org/wiki/XQuery_3.0#Group_By or in the spec.
> It will look something like this:
>
> for $x in /dataset/item/pii
> let $val := $x/string()
> where count($x) > 1
> return <duplicate>{$val}</duplicate>
>
> By the way, it looks you are using null like a NULL statement in other
> languages. Such a statement does not exist in XQuery. You might want
> to return an empty set () instead.
>
> Cheers,
> Dirk
>
>
> On 09/04/14 13:37, Hondros, Constantine (ELS-AMS) wrote:
>> I'm running out of memory (1.5 GB allocated) when querying for duplicate 
>> node values over a fairly flat XML database of approximately 450 MB.
>>
>> Can anyone suggest a more memory-efficient approach to framing this query 
>> than iterating over distinct-values as I do below?  I'm hoping that there 
>> are some Basex tips and tricks to help out here.
>>
>> for $val in distinct-values(/dataset/item/pii)
>> let $cnt := count(/dataset/item/pii[. = $val]) return
>>   if ($cnt > 1) then
>>       <duplicate>{$val}</duplicate>
>>   else
>>     null
>>
>>
>> Thanks in advance,
>> Constantine
>>
>>
>> ________________________________
>>
>> Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
>> Netherlands, Registration No. 33156677, Registered in The Netherlands.
>>
>
> --
> Dirk Kirsten, BaseX GmbH, http://basex.org
> |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
> |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
> |   Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
> `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22

________________________________

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.

Reply via email to