Adding a text() step to the predicate does dramatically reduce the execution time.
This query: declare namespace marc="http://www.loc.gov/MARC21/slim"; for $m in collection( "latin_hold_20150730" )/marc:collection/marc:record, $r in collection( "latin_hold_20150730" )/root/row[BIB_ID/text() = $m/marc:controlfield[@tag = '001']] group by $key := $r/ITEM_ID return <test n="{ $key }">{ $m/marc:datafield[@tag = '245']/marc:subfield[@code = 'a']/text() }</test> Executes in 257.13 ms. But it shouldn't be necessary to explicitly specify the text() node, right (or would even be bad practice to do so[1])? Thanks, Tim [1] http://stackoverflow.com/questions/12970162/xquery-join-query#comment-17592189 -- Tim A. Thompson Metadata Librarian (Spanish/Portuguese Specialty) Princeton University Library On Tue, Aug 4, 2015 at 8:25 AM, Tim Thompson <[email protected]> wrote: > I was mistaken; I hadn't enabled indexes on this database. However, after > creating attribute and text indexes, the query actually seems to take > longer to execute (483622.95 ms on last run), although the query plan > itself doesn't seem to have changed: > > Compiling: > - pre-evaluating collection("latin_hold_20150730") > - pre-evaluating collection("latin_hold_20150730") > Query: > declare namespace marc="http://www.loc.gov/MARC21/slim"; for $m in > collection( "latin_hold_20150730" )/marc:collection/marc:record, $r in > collection( "latin_hold_20150730" )/root/row[BIB_ID = > $m/marc:controlfield[@tag = '001']] group by $key := $r/ITEM_ID return > <test n="{ $key }">{ $m/marc:datafield[@tag = '245']/marc:subfield[@code = > 'a']/string() }</test> > Optimized Query: > for $m_0 in (db:open-pre("latin_hold_20150730",0), > ...)/marc:collection/marc:record for $r_1 in > (db:open-pre("latin_hold_20150730",0), ...)/root/row[(BIB_ID = > $m_0/marc:controlfield[(@tag = "001")])] let (: post-group :) $m_4 := $m_0 > group by $key_2 := $r_1/ITEM_ID return element test { (attribute n { > ($key_2) }, $m_4/marc:datafield[(@tag = "245")]/marc:subfield[(@code = > "a")]/string()) } > Result: > - Hit(s): 7587 Items > - Updated: 0 Items > - Printed: 505 KB > - Read Locking: local [latin_hold_20150730] > - Write Locking: none > Timing: > - Parsing: 1.04 ms > - Compiling: 0.86 ms > - Evaluating: 483604.85 ms > - Printing: 16.2 ms > - Total Time: 483622.95 ms > > Query plan: > <QueryPlan compiled="true"> > <GFLWOR> > <For> > <Var name="$m" id="0"/> > <IterPath> > <DBNodeSeq size="2"> > <DBNode name="latin_hold_20150730" pre="0"/> > <DBNode name="latin_hold_20150730" pre="365692"/> > </DBNodeSeq> > <IterStep axis="child" test="marc:collection"/> > <IterStep axis="child" test="marc:record"/> > </IterPath> > </For> > <For> > <Var name="$r" id="1"/> > <IterPath> > <DBNodeSeq size="2"> > <DBNode name="latin_hold_20150730" pre="0"/> > <DBNode name="latin_hold_20150730" pre="365692"/> > </DBNodeSeq> > <IterStep axis="child" test="root"/> > <IterStep axis="child" test="row"> > <CmpG op="="> > <CachedPath> > <IterStep axis="child" test="BIB_ID"/> > </CachedPath> > <IterPath> > <VarRef> > <Var name="$m" id="0"/> > </VarRef> > <IterStep axis="child" test="marc:controlfield"> > <CmpG op="="> > <CachedPath> > <IterStep axis="attribute" test="tag"/> > </CachedPath> > <Str value="001" type="xs:string"/> > </CmpG> > </IterStep> > </IterPath> > </CmpG> > </IterStep> > </IterPath> > </For> > <GroupBy> > <Spec> > <Var name="$key" id="2"/> > <IterPath> > <VarRef> > <Var name="$r" id="1"/> > </VarRef> > <IterStep axis="child" test="ITEM_ID"/> > </IterPath> > </Spec> > </GroupBy> > <CElem> > <QNm value="test" type="xs:QName"/> > <CAttr> > <QNm value="n" type="xs:QName"/> > <VarRef> > <Var name="$key" id="2"/> > </VarRef> > </CAttr> > <MixedPath> > <VarRef> > <Var name="$m" id="4"/> > </VarRef> > <IterStep axis="child" test="marc:datafield"> > <CmpG op="="> > <CachedPath> > <IterStep axis="attribute" test="tag"/> > </CachedPath> > <Str value="245" type="xs:string"/> > </CmpG> > </IterStep> > <IterStep axis="child" test="marc:subfield"> > <CmpG op="="> > <CachedPath> > <IterStep axis="attribute" test="code"/> > </CachedPath> > <Str value="a" type="xs:string"/> > </CmpG> > </IterStep> > <FnString name="string([item])"/> > </MixedPath> > </CElem> > </GFLWOR> > </QueryPlan> > > > -- > Tim A. Thompson > Metadata Librarian (Spanish/Portuguese Specialty) > Princeton University Library > > > On Tue, Aug 4, 2015 at 3:23 AM, Christian Grün <[email protected]> > wrote: > >> Dear Tim, >> >> The query plan indicates that no index is applied. Your query may be >> evaluated faster when rewriting "BIB_ID" to "BIB_ID/text()". I will >> see if this can automatically be done by the query compiler. >> >> Best, >> Christian >> >> >> On Tue, Aug 4, 2015 at 4:42 AM, Tim Thompson <[email protected]> wrote: >> > Hello, >> > >> > I have a database containing two resources/documents: they both >> represent >> > the same set of library catalog records (7728 "records" in each), but >> they >> > each contain different data that I want to join. >> > >> > The first resource looks like this: >> > >> > <marc:collection >> > xmlns:marc="http://www.loc.gov/MARC21/slim"> >> > <marc:record> >> > <marc:leader>01225cam a2200373Mi 4500</marc:leader> >> > <marc:controlfield tag="001">5323084</marc:controlfield> >> > <marc:datafield >> > ind1="1" >> > ind2="4" >> > tag="245"> >> > <marc:subfield code="a">Els teleclubs a les illes >> Balears >> > :</marc:subfield> >> > </marc:datafield> >> > </marc:record> >> > <marc:record> >> > <marc:leader>01225cam a2200373Mi 4500</marc:leader> >> > <marc:controlfield tag="001">5323084</marc:controlfield> >> > <marc:datafield >> > ind1="1" >> > ind2="4" >> > tag="245"> >> > <marc:subfield code="a">Els teleclubs a les illes >> Balears >> > :</marc:subfield> >> > </marc:datafield> >> > </marc:record> >> > <marc:record> >> > <marc:leader>00818cam a2200241Mi 4500</marc:leader> >> > <marc:controlfield tag="001">6310976</marc:controlfield> >> > <marc:datafield >> > ind1="0" >> > ind2="0" >> > tag="245"> >> > <marc:subfield code="a">Diccionari manual de sinònims i >> > antònims de la llengua catalana /</marc:subfield> >> > </marc:datafield> >> > </marc:record> >> > </marc:collection> >> > >> > The second one looks like this: >> > >> > <root> >> > <row> >> > <LANGUAGE>cat</LANGUAGE> >> > <ITEM_ID>5912416</ITEM_ID> >> > <BIB_ID>5323084</BIB_ID> >> > <VENDOR_CODE>MXBKSMX</VENDOR_CODE> >> > </row> >> > <row> >> > <LANGUAGE>cat</LANGUAGE> >> > <ITEM_ID>5912416</ITEM_ID> >> > <BIB_ID>5323084</BIB_ID> >> > <VENDOR_CODE>PUVILL</VENDOR_CODE> >> > </row> >> > <row> >> > <LANGUAGE>cat</LANGUAGE> >> > <ITEM_ID>5935043</ITEM_ID> >> > <BIB_ID>6310976</BIB_ID> >> > <VENDOR_CODE>PUVILL</VENDOR_CODE> >> > </row> >> > </root> >> > >> > I have a simple query that joins the two using the value of the >> > marc:controlfield[@tag = '001'] from resource 1 and the BIB_ID from >> resource >> > 2. >> > >> > The query: >> > >> > declare namespace marc="http://www.loc.gov/MARC21/slim"; >> > >> > for $m in collection( >> > "latin_hold_20150730" >> > )/marc:collection/marc:record, >> > $r in collection( >> > "latin_hold_20150730" >> > )/root/row[BIB_ID = $m/marc:controlfield[@tag = '001']] >> > group by $key := $r/ITEM_ID >> > >> > return >> > <test n="{ >> > $key >> > }">{ >> > $m/marc:datafield[@tag = '245']/marc:subfield[@code = 'a']/string() >> > }</test> >> > >> > When I run this using Saxon (substituting fn:doc for fn:collection), it >> only >> > takes a second to execute. In BaseX 8.2.3 (using the GUI), however, it >> takes >> > a very long time to execute (around 6 minutes!). The BaseX database has >> both >> > attribute and text indexes enabled. >> > >> > Any idea what is causing it to take so long? >> > >> > Here is the BaseX Query Info: >> > >> > Total Time: 383756.3 ms >> > >> > Compiling: >> > - pre-evaluating collection("latin_hold_20150730") >> > - pre-evaluating collection("latin_hold_20150730") >> > Query: >> > declare namespace marc="http://www.loc.gov/MARC21/slim"; for $m in >> > collection( "latin_hold_20150730" )/marc:collection/marc:record, $r in >> > collection( "latin_hold_20150730" )/root/row[BIB_ID = >> > $m/marc:controlfield[@tag = '001']] group by $key := $r/ITEM_ID return >> <test >> > n="{ $key }">{ $m/marc:datafield[@tag = '245']/marc:subfield[@code = >> > 'a']/string() }</test> >> > Optimized Query: >> > for $m_0 in (db:open-pre("latin_hold_20150730",0), >> > ...)/marc:collection/marc:record for $r_1 in >> > (db:open-pre("latin_hold_20150730",0), ...)/root/row[(BIB_ID = >> > $m_0/marc:controlfield[(@tag = "001")])] let (: post-group :) $m_4 := >> $m_0 >> > group by $key_2 := $r_1/ITEM_ID return element test { (attribute n { >> > ($key_2) }, $m_4/marc:datafield[(@tag = "245")]/marc:subfield[(@code = >> > "a")]/string()) } >> > Result: >> > - Hit(s): 7587 Items >> > - Updated: 0 Items >> > - Printed: 505 KB >> > - Read Locking: local [latin_hold_20150730] >> > - Write Locking: none >> > Timing: >> > - Parsing: 0.43 ms >> > - Compiling: 1.96 ms >> > - Evaluating: 383737.5 ms >> > - Printing: 16.41 ms >> > - Total Time: 383756.3 ms >> > Query plan: >> > <QueryPlan compiled="true"> >> > <GFLWOR> >> > <For> >> > <Var name="$m" id="0"/> >> > <IterPath> >> > <DBNodeSeq size="2"> >> > <DBNode name="latin_hold_20150730" pre="0"/> >> > <DBNode name="latin_hold_20150730" pre="365692"/> >> > </DBNodeSeq> >> > <IterStep axis="child" test="marc:collection"/> >> > <IterStep axis="child" test="marc:record"/> >> > </IterPath> >> > </For> >> > <For> >> > <Var name="$r" id="1"/> >> > <IterPath> >> > <DBNodeSeq size="2"> >> > <DBNode name="latin_hold_20150730" pre="0"/> >> > <DBNode name="latin_hold_20150730" pre="365692"/> >> > </DBNodeSeq> >> > <IterStep axis="child" test="root"/> >> > <IterStep axis="child" test="row"> >> > <CmpG op="="> >> > <CachedPath> >> > <IterStep axis="child" test="BIB_ID"/> >> > </CachedPath> >> > <IterPath> >> > <VarRef> >> > <Var name="$m" id="0"/> >> > </VarRef> >> > <IterStep axis="child" test="marc:controlfield"> >> > <CmpG op="="> >> > <CachedPath> >> > <IterStep axis="attribute" test="tag"/> >> > </CachedPath> >> > <Str value="001" type="xs:string"/> >> > </CmpG> >> > </IterStep> >> > </IterPath> >> > </CmpG> >> > </IterStep> >> > </IterPath> >> > </For> >> > <GroupBy> >> > <Spec> >> > <Var name="$key" id="2"/> >> > <IterPath> >> > <VarRef> >> > <Var name="$r" id="1"/> >> > </VarRef> >> > <IterStep axis="child" test="ITEM_ID"/> >> > </IterPath> >> > </Spec> >> > </GroupBy> >> > <CElem> >> > <QNm value="test" type="xs:QName"/> >> > <CAttr> >> > <QNm value="n" type="xs:QName"/> >> > <VarRef> >> > <Var name="$key" id="2"/> >> > </VarRef> >> > </CAttr> >> > <MixedPath> >> > <VarRef> >> > <Var name="$m" id="4"/> >> > </VarRef> >> > <IterStep axis="child" test="marc:datafield"> >> > <CmpG op="="> >> > <CachedPath> >> > <IterStep axis="attribute" test="tag"/> >> > </CachedPath> >> > <Str value="245" type="xs:string"/> >> > </CmpG> >> > </IterStep> >> > <IterStep axis="child" test="marc:subfield"> >> > <CmpG op="="> >> > <CachedPath> >> > <IterStep axis="attribute" test="code"/> >> > </CachedPath> >> > <Str value="a" type="xs:string"/> >> > </CmpG> >> > </IterStep> >> > <FnString name="string([item])"/> >> > </MixedPath> >> > </CElem> >> > </GFLWOR> >> > </QueryPlan> >> > >> > >> > Thanks in advance! >> > >> > Tim >> > >> > >> > -- >> > Tim A. Thompson >> > Metadata Librarian (Spanish/Portuguese Specialty) >> > Princeton University Library >> > >> > >

