I found my optimization: just use map:merge() rather than trying to select the 
individual issues to fill a given doc’s map:

          let $incidentsByDoc as map(*) :=
            map:merge(
              for $incident in $incidents
              let $docName := $incident/report:systemID ! string(.) ! 
relpath:getName(.)
              return
              map{
                $docName : $incident
              }
              , map{'duplicates' : 'combine'}
            )

This runs in about 800ms.

Clearly I need to remind myself about this technique on a regular basis.

Cheers,

E.

_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | 
Twitter<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Facebook<https://www.facebook.com/servicenow>

From: Eliot Kimber <eliot.kim...@servicenow.com>
Date: Wednesday, June 7, 2023 at 4:36 PM
To: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Subject: Possible to optimize XPath lookup by string match on element content?
In my application I load numerous Oxygen validation report reports, consisting 
of many <incident> elements, where each incident is associated with a specific 
input document, captured in a <systemID> element:

<incident xmlns=http://www.oxygenxml.com/ns/report>
  <engine>oXygen</engine>
  <severity>error</severity>
  <description>Cannot find definition for key 
"image.install-store-app-menu-path". Key Scopes:[[bundle-itsm], [reuse], 
[reuse]]. Keys are gathered from: 
bundle-itsm-it-service-management.ditamap.</description>
  
<systemID>/Users/eliot.kimber/git-basex/utah/doc/source/reuse/activation/install-store-apps-steps.dita</systemID>
  <profile>suite-prod</profile>
  <type>Key reference</type>
  <location>
    <start>
      <line>33</line>
      <column>26</column>
    </start>
    <end>
      <line>33</line>
      <column>61</column>
    </end>
    <length>35</length>
  </location>
</incident>

I need correlate these incidents to the docs as stored in the database, where 
the match is just on the filename, not any part of the path (although I could 
match on the part of the path starting with “doc/source”).

In my testing, with about 27K of these incident elements, it takes about 60 
seconds to build a set of maps where the keys are the filenames and the values 
are the sequences of <incident> elements that match a given filename, i.e.:

          let $incidentsByDoc as map(*) :=
             map:merge(
               let $docNames as xs:string* := $incidents/report:systemID ! 
string(.) ! relpath:getName(.) => distinct-values()
               return
               for $docName in $docNames
               return
               map{
                 $docName : $incidents[report:systemID/text() contains text { 
'/' || $docName}]
               }
             )

I think the only optimization is to save the resulting map as an XML file or 
(with BaseX 10) just save the map for later use.

But I’m curious if there’s some other XPath-level optimization that would make 
this lookup faster?

I’m already using the text index via “contains text”, although I suspect that 
it’s not really offering an advantage over a simple ends-with(.) check.

Thanks,

Eliot

_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | 
Twitter<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Facebook<https://www.facebook.com/servicenow>

Reply via email to