Adding a text() step to the predicate does dramatically reduce the
execution time.

This query:

declare namespace marc="http://www.loc.gov/MARC21/slim";;

for $m in collection(
  "latin_hold_20150730"
)/marc:collection/marc:record,
$r in collection(
  "latin_hold_20150730"
)/root/row[BIB_ID/text() = $m/marc:controlfield[@tag = '001']]
group by $key := $r/ITEM_ID

return
<test n="{
  $key
}">{
    $m/marc:datafield[@tag = '245']/marc:subfield[@code = 'a']/text()
}</test>

Executes in 257.13 ms. But it shouldn't be necessary to explicitly specify
the text() node, right (or would even be bad practice to do so[1])?

Thanks,
Tim

[1]
http://stackoverflow.com/questions/12970162/xquery-join-query#comment-17592189


--
Tim A. Thompson
Metadata Librarian (Spanish/Portuguese Specialty)
Princeton University Library


On Tue, Aug 4, 2015 at 8:25 AM, Tim Thompson <[email protected]> wrote:

> I was mistaken; I hadn't enabled indexes on this database. However, after
> creating attribute and text indexes, the query actually seems to take
> longer to execute (483622.95 ms on last run), although the query plan
> itself doesn't seem to have changed:
>
> Compiling:
> - pre-evaluating collection("latin_hold_20150730")
> - pre-evaluating collection("latin_hold_20150730")
> Query:
> declare namespace marc="http://www.loc.gov/MARC21/slim";; for $m in
> collection( "latin_hold_20150730" )/marc:collection/marc:record, $r in
> collection( "latin_hold_20150730" )/root/row[BIB_ID =
> $m/marc:controlfield[@tag = '001']] group by $key := $r/ITEM_ID return
> <test n="{ $key }">{ $m/marc:datafield[@tag = '245']/marc:subfield[@code =
> 'a']/string() }</test>
> Optimized Query:
> for $m_0 in (db:open-pre("latin_hold_20150730",0),
> ...)/marc:collection/marc:record for $r_1 in
> (db:open-pre("latin_hold_20150730",0), ...)/root/row[(BIB_ID =
> $m_0/marc:controlfield[(@tag = "001")])] let (: post-group :) $m_4 := $m_0
> group by $key_2 := $r_1/ITEM_ID return element test { (attribute n {
> ($key_2) }, $m_4/marc:datafield[(@tag = "245")]/marc:subfield[(@code =
> "a")]/string()) }
> Result:
> - Hit(s): 7587 Items
> - Updated: 0 Items
> - Printed: 505 KB
> - Read Locking: local [latin_hold_20150730]
> - Write Locking: none
> Timing:
> - Parsing: 1.04 ms
> - Compiling: 0.86 ms
> - Evaluating: 483604.85 ms
> - Printing: 16.2 ms
> - Total Time: 483622.95 ms
>
> Query plan:
> <QueryPlan compiled="true">
>   <GFLWOR>
>     <For>
>       <Var name="$m" id="0"/>
>       <IterPath>
>         <DBNodeSeq size="2">
>           <DBNode name="latin_hold_20150730" pre="0"/>
>           <DBNode name="latin_hold_20150730" pre="365692"/>
>         </DBNodeSeq>
>         <IterStep axis="child" test="marc:collection"/>
>         <IterStep axis="child" test="marc:record"/>
>       </IterPath>
>     </For>
>     <For>
>       <Var name="$r" id="1"/>
>       <IterPath>
>         <DBNodeSeq size="2">
>           <DBNode name="latin_hold_20150730" pre="0"/>
>           <DBNode name="latin_hold_20150730" pre="365692"/>
>         </DBNodeSeq>
>         <IterStep axis="child" test="root"/>
>         <IterStep axis="child" test="row">
>           <CmpG op="=">
>             <CachedPath>
>               <IterStep axis="child" test="BIB_ID"/>
>             </CachedPath>
>             <IterPath>
>               <VarRef>
>                 <Var name="$m" id="0"/>
>               </VarRef>
>               <IterStep axis="child" test="marc:controlfield">
>                 <CmpG op="=">
>                   <CachedPath>
>                     <IterStep axis="attribute" test="tag"/>
>                   </CachedPath>
>                   <Str value="001" type="xs:string"/>
>                 </CmpG>
>               </IterStep>
>             </IterPath>
>           </CmpG>
>         </IterStep>
>       </IterPath>
>     </For>
>     <GroupBy>
>       <Spec>
>         <Var name="$key" id="2"/>
>         <IterPath>
>           <VarRef>
>             <Var name="$r" id="1"/>
>           </VarRef>
>           <IterStep axis="child" test="ITEM_ID"/>
>         </IterPath>
>       </Spec>
>     </GroupBy>
>     <CElem>
>       <QNm value="test" type="xs:QName"/>
>       <CAttr>
>         <QNm value="n" type="xs:QName"/>
>         <VarRef>
>           <Var name="$key" id="2"/>
>         </VarRef>
>       </CAttr>
>       <MixedPath>
>         <VarRef>
>           <Var name="$m" id="4"/>
>         </VarRef>
>         <IterStep axis="child" test="marc:datafield">
>           <CmpG op="=">
>             <CachedPath>
>               <IterStep axis="attribute" test="tag"/>
>             </CachedPath>
>             <Str value="245" type="xs:string"/>
>           </CmpG>
>         </IterStep>
>         <IterStep axis="child" test="marc:subfield">
>           <CmpG op="=">
>             <CachedPath>
>               <IterStep axis="attribute" test="code"/>
>             </CachedPath>
>             <Str value="a" type="xs:string"/>
>           </CmpG>
>         </IterStep>
>         <FnString name="string([item])"/>
>       </MixedPath>
>     </CElem>
>   </GFLWOR>
> </QueryPlan>
>
>
> --
> Tim A. Thompson
> Metadata Librarian (Spanish/Portuguese Specialty)
> Princeton University Library
>
>
> On Tue, Aug 4, 2015 at 3:23 AM, Christian Grün <[email protected]>
> wrote:
>
>> Dear Tim,
>>
>> The query plan indicates that no index is applied. Your query may be
>> evaluated faster when rewriting "BIB_ID" to "BIB_ID/text()". I will
>> see if this can automatically be done by the query compiler.
>>
>> Best,
>> Christian
>>
>>
>> On Tue, Aug 4, 2015 at 4:42 AM, Tim Thompson <[email protected]> wrote:
>> > Hello,
>> >
>> > I have a database containing two resources/documents: they both
>> represent
>> > the same set of library catalog records (7728 "records" in each), but
>> they
>> > each contain different data that I want to join.
>> >
>> > The first resource looks like this:
>> >
>> >         <marc:collection
>> >         xmlns:marc="http://www.loc.gov/MARC21/slim";>
>> >         <marc:record>
>> >             <marc:leader>01225cam a2200373Mi 4500</marc:leader>
>> >             <marc:controlfield tag="001">5323084</marc:controlfield>
>> >             <marc:datafield
>> >                 ind1="1"
>> >                 ind2="4"
>> >                 tag="245">
>> >                 <marc:subfield code="a">Els teleclubs a les illes
>> Balears
>> > :</marc:subfield>
>> >             </marc:datafield>
>> >         </marc:record>
>> >         <marc:record>
>> >             <marc:leader>01225cam a2200373Mi 4500</marc:leader>
>> >             <marc:controlfield tag="001">5323084</marc:controlfield>
>> >             <marc:datafield
>> >                 ind1="1"
>> >                 ind2="4"
>> >                 tag="245">
>> >                 <marc:subfield code="a">Els teleclubs a les illes
>> Balears
>> > :</marc:subfield>
>> >             </marc:datafield>
>> >         </marc:record>
>> >         <marc:record>
>> >             <marc:leader>00818cam a2200241Mi 4500</marc:leader>
>> >             <marc:controlfield tag="001">6310976</marc:controlfield>
>> >             <marc:datafield
>> >                 ind1="0"
>> >                 ind2="0"
>> >                 tag="245">
>> >                 <marc:subfield code="a">Diccionari manual de sinònims i
>> > antònims de la llengua catalana /</marc:subfield>
>> >             </marc:datafield>
>> >         </marc:record>
>> >     </marc:collection>
>> >
>> > The second one looks like this:
>> >
>> >         <root>
>> >         <row>
>> >             <LANGUAGE>cat</LANGUAGE>
>> >             <ITEM_ID>5912416</ITEM_ID>
>> >             <BIB_ID>5323084</BIB_ID>
>> >             <VENDOR_CODE>MXBKSMX</VENDOR_CODE>
>> >         </row>
>> >         <row>
>> >             <LANGUAGE>cat</LANGUAGE>
>> >             <ITEM_ID>5912416</ITEM_ID>
>> >             <BIB_ID>5323084</BIB_ID>
>> >             <VENDOR_CODE>PUVILL</VENDOR_CODE>
>> >         </row>
>> >         <row>
>> >             <LANGUAGE>cat</LANGUAGE>
>> >             <ITEM_ID>5935043</ITEM_ID>
>> >             <BIB_ID>6310976</BIB_ID>
>> >             <VENDOR_CODE>PUVILL</VENDOR_CODE>
>> >         </row>
>> >     </root>
>> >
>> > I have a simple query that joins the two using the value of the
>> > marc:controlfield[@tag = '001'] from resource 1 and the BIB_ID from
>> resource
>> > 2.
>> >
>> > The query:
>> >
>> > declare namespace marc="http://www.loc.gov/MARC21/slim";;
>> >
>> > for $m in collection(
>> >   "latin_hold_20150730"
>> > )/marc:collection/marc:record,
>> > $r in collection(
>> >   "latin_hold_20150730"
>> > )/root/row[BIB_ID = $m/marc:controlfield[@tag = '001']]
>> > group by $key := $r/ITEM_ID
>> >
>> > return
>> > <test n="{
>> >   $key
>> > }">{
>> >     $m/marc:datafield[@tag = '245']/marc:subfield[@code = 'a']/string()
>> > }</test>
>> >
>> > When I run this using Saxon (substituting fn:doc for fn:collection), it
>> only
>> > takes a second to execute. In BaseX 8.2.3 (using the GUI), however, it
>> takes
>> > a very long time to execute (around 6 minutes!). The BaseX database has
>> both
>> > attribute and text indexes enabled.
>> >
>> > Any idea what is causing it to take so long?
>> >
>> > Here is the BaseX Query Info:
>> >
>> > Total Time: 383756.3 ms
>> >
>> > Compiling:
>> > - pre-evaluating collection("latin_hold_20150730")
>> > - pre-evaluating collection("latin_hold_20150730")
>> > Query:
>> > declare namespace marc="http://www.loc.gov/MARC21/slim";; for $m in
>> > collection( "latin_hold_20150730" )/marc:collection/marc:record, $r in
>> > collection( "latin_hold_20150730" )/root/row[BIB_ID =
>> > $m/marc:controlfield[@tag = '001']] group by $key := $r/ITEM_ID return
>> <test
>> > n="{ $key }">{ $m/marc:datafield[@tag = '245']/marc:subfield[@code =
>> > 'a']/string() }</test>
>> > Optimized Query:
>> > for $m_0 in (db:open-pre("latin_hold_20150730",0),
>> > ...)/marc:collection/marc:record for $r_1 in
>> > (db:open-pre("latin_hold_20150730",0), ...)/root/row[(BIB_ID =
>> > $m_0/marc:controlfield[(@tag = "001")])] let (: post-group :) $m_4 :=
>> $m_0
>> > group by $key_2 := $r_1/ITEM_ID return element test { (attribute n {
>> > ($key_2) }, $m_4/marc:datafield[(@tag = "245")]/marc:subfield[(@code =
>> > "a")]/string()) }
>> > Result:
>> > - Hit(s): 7587 Items
>> > - Updated: 0 Items
>> > - Printed: 505 KB
>> > - Read Locking: local [latin_hold_20150730]
>> > - Write Locking: none
>> > Timing:
>> > - Parsing: 0.43 ms
>> > - Compiling: 1.96 ms
>> > - Evaluating: 383737.5 ms
>> > - Printing: 16.41 ms
>> > - Total Time: 383756.3 ms
>> > Query plan:
>> > <QueryPlan compiled="true">
>> >   <GFLWOR>
>> >     <For>
>> >       <Var name="$m" id="0"/>
>> >       <IterPath>
>> >         <DBNodeSeq size="2">
>> >           <DBNode name="latin_hold_20150730" pre="0"/>
>> >           <DBNode name="latin_hold_20150730" pre="365692"/>
>> >         </DBNodeSeq>
>> >         <IterStep axis="child" test="marc:collection"/>
>> >         <IterStep axis="child" test="marc:record"/>
>> >       </IterPath>
>> >     </For>
>> >     <For>
>> >       <Var name="$r" id="1"/>
>> >       <IterPath>
>> >         <DBNodeSeq size="2">
>> >           <DBNode name="latin_hold_20150730" pre="0"/>
>> >           <DBNode name="latin_hold_20150730" pre="365692"/>
>> >         </DBNodeSeq>
>> >         <IterStep axis="child" test="root"/>
>> >         <IterStep axis="child" test="row">
>> >           <CmpG op="=">
>> >             <CachedPath>
>> >               <IterStep axis="child" test="BIB_ID"/>
>> >             </CachedPath>
>> >             <IterPath>
>> >               <VarRef>
>> >                 <Var name="$m" id="0"/>
>> >               </VarRef>
>> >               <IterStep axis="child" test="marc:controlfield">
>> >                 <CmpG op="=">
>> >                   <CachedPath>
>> >                     <IterStep axis="attribute" test="tag"/>
>> >                   </CachedPath>
>> >                   <Str value="001" type="xs:string"/>
>> >                 </CmpG>
>> >               </IterStep>
>> >             </IterPath>
>> >           </CmpG>
>> >         </IterStep>
>> >       </IterPath>
>> >     </For>
>> >     <GroupBy>
>> >       <Spec>
>> >         <Var name="$key" id="2"/>
>> >         <IterPath>
>> >           <VarRef>
>> >             <Var name="$r" id="1"/>
>> >           </VarRef>
>> >           <IterStep axis="child" test="ITEM_ID"/>
>> >         </IterPath>
>> >       </Spec>
>> >     </GroupBy>
>> >     <CElem>
>> >       <QNm value="test" type="xs:QName"/>
>> >       <CAttr>
>> >         <QNm value="n" type="xs:QName"/>
>> >         <VarRef>
>> >           <Var name="$key" id="2"/>
>> >         </VarRef>
>> >       </CAttr>
>> >       <MixedPath>
>> >         <VarRef>
>> >           <Var name="$m" id="4"/>
>> >         </VarRef>
>> >         <IterStep axis="child" test="marc:datafield">
>> >           <CmpG op="=">
>> >             <CachedPath>
>> >               <IterStep axis="attribute" test="tag"/>
>> >             </CachedPath>
>> >             <Str value="245" type="xs:string"/>
>> >           </CmpG>
>> >         </IterStep>
>> >         <IterStep axis="child" test="marc:subfield">
>> >           <CmpG op="=">
>> >             <CachedPath>
>> >               <IterStep axis="attribute" test="code"/>
>> >             </CachedPath>
>> >             <Str value="a" type="xs:string"/>
>> >           </CmpG>
>> >         </IterStep>
>> >         <FnString name="string([item])"/>
>> >       </MixedPath>
>> >     </CElem>
>> >   </GFLWOR>
>> > </QueryPlan>
>> >
>> >
>> > Thanks in advance!
>> >
>> > Tim
>> >
>> >
>> > --
>> > Tim A. Thompson
>> > Metadata Librarian (Spanish/Portuguese Specialty)
>> > Princeton University Library
>> >
>>
>
>

Reply via email to