Thanks in advance for the comprehensive test case. I’ll probably look at it on Monday or Tuesday.
On Sat, May 9, 2020 at 8:18 AM BIRKNER Michael <michael.birk...@akwien.at> wrote: > Hello again, > > I managed to test again today. Unfortunately I still observe the same > performance problem in 9.3.2, also with the query that Christian supplied. > I also tried in 9.3.3 snapshot - same performance loss as in 9.3.2. Still, > everything is working fine in 9.2.4. > > For reproducing the problem I assembled a package with all original XML > files, the xQueries I execute and a description of the steps I follow (see > README file in the package). As the XML-data are licenced under CC0 there > should be no problem in sharing them with the community. You can download > the whole package here (.zip file with ~150MB): > > https://drive.google.com/open?id=1o09YZAqj5Y6ys3oE2tX8JRJ3GKoQ2xUr > > I hope that helps tracking down the problem. > > Best regards, > Michael > > > > Mag. Michael Birkner > AK Wien - Bibliothek > 1040, Prinz Eugen Straße 20-22 > T: +43 1 501 65 12455 > F: +43 1 501 65 142455 > M: +43 664 88957669 > > michael.birk...@akwien.at <michael.birk...@akwien.at> > wien.arbeiterkammer.at > > Besuchen Sie uns auch auf: > facebook <http://www.facebook.com/arbeiterkammer/> | twitter > <https://twitter.com/Arbeiterkammer> | youtube > <https://www.youtube.com/user/AKoesterreich> > -------------------------------------------------- > > *Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein. Damals. Heute. > Für immer.* > > *arbeiterkammer.at/100 <https://arbeiterkammer.at/100>** > <https://arbeiterkammer.at/100>* <https://w.ak.at/zukunftsprogramm> > > > ------------------------------ > *Von:* Christian Grün <christian.gr...@gmail.com> > *Gesendet:* Freitag, 8. Mai 2020 14:24 > *An:* BIRKNER Michael > *Cc:* basex-talk@mailman.uni-konstanz.de > *Betreff:* Re: [basex-talk] Performance loss between version 9.2.4 and > 9.3.2 when executing specific xQuery > > And I’m always delighted to be confronted with library use case. BaseX > grew up with library data; at that time, mostly XML variants of MAB2. > > I made another intent to reproduce your setting by creating two databases > with MARCXML data (rather small, 10.000 and 10 documents each). This is the > query I tried: > > let $recsFromDb1 := db:open('db1')//*:record > let $recsFromDb2 := db:open('db2')//*:record > let $idsFromRecsInDb1 := distinct-values( > $recsFromDb1/*:controlfield[@tag = '001'] > ) > for $id in $idsFromRecsInDb1 > let $recFromDb2WithSameId := $recsFromDb2 > [*:controlfield[@tag = '001'] = $id] > return $recFromDb2WithSameId > > Both query plans and execution times are pretty much the same. Can you > tell me what I need to change in my query to simulate the slowdown? > > As a preview, I already have an idea how you can boost the query > evaluation (provided your databases have up-to-date index structures)… > > > > > On Fri, May 8, 2020 at 1:31 PM BIRKNER Michael <michael.birk...@akwien.at> > wrote: > >> Hi Christian, >> >> >> thank you for your answers. As you can guess the queries I sent in my >> original email are just simplified examples. >> >> >> The real XML structure is like the following (its library data in format >> "MarcXML", here you see an example: >> https://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml) >> >> >> *db1:* each of the 7489 documents has this structure >> >> >> <collection> >> >> <record> >> >> <controlfield tag="001">ID-Number</controlfield> >> >> ... [more tags named "controlfield" or "datafield"] >> >> </record> >> >> ... [more records] >> >> </collection> >> >> >> So in db1 I have 7489 documents each with a >> "<collection><record>...</record></collection>" structure, so I have 7489 >> "collection" nodes. >> >> >> *db2:* It's the same structure as above, but there is only 1 >> "collection" and all "records" are within that "collection". >> >> >> Some background information: >> >> In db1 I save updated versions of records (downloaded from an OAI-PMH >> interface, which gives me only 50 records at a time, so I have to page >> through the results and get 7489 XML-files in the end that I import into >> db1) that also (partly) exist in db2. So there are multiple records with >> the same ID (normally only 2 [the original and the updated one, but there >> could be the case when there are 3 or more records with the same ID because >> the downloaded updates could contain multiple records with the same ID [an >> updated one and an update of the updated one and so on ... I know ... >> complicated]). >> >> One of the records with the same ID is the newest one. My goal is to find >> the newest one and delete the others (based on a timestamp that is also >> found in another <controlfield> in the record). So all of this is about >> updating records in an existing database from downloaded update-files that >> I get via OAI. >> >> >> I hope this information helps. And thank you for pointing out the new >> version 9.3.3. I will try that one. >> >> >> Best regards, >> >> Michael >> >> >> >> >> Mag. Michael Birkner >> AK Wien - Bibliothek >> 1040, Prinz Eugen Straße 20-22 >> T: +43 1 501 65 12455 >> F: +43 1 501 65 142455 >> M: +43 664 88957669 >> >> michael.birk...@akwien.at <michael.birk...@akwien.at> >> wien.arbeiterkammer.at >> >> Besuchen Sie uns auch auf: >> facebook <http://www.facebook.com/arbeiterkammer/> | twitter >> <https://twitter.com/Arbeiterkammer> | youtube >> <https://www.youtube.com/user/AKoesterreich> >> -------------------------------------------------- >> >> *Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein. Damals. Heute. >> Für immer.* >> >> *arbeiterkammer.at/100 <https://arbeiterkammer.at/100>** >> <https://arbeiterkammer.at/100>* <https://w.ak.at/zukunftsprogramm> >> >> >> ------------------------------ >> *Von:* Christian Grün <christian.gr...@gmail.com> >> *Gesendet:* Freitag, 8. Mai 2020 12:37 >> *An:* BIRKNER Michael >> *Cc:* basex-talk@mailman.uni-konstanz.de >> *Betreff:* Re: [basex-talk] Performance loss between version 9.2.4 and >> 9.3.2 when executing specific xQuery >> >> I tried to reproduce your use case by creating some sample data (with a >> few millions of entries), but both the query plan and the performance were >> similar in 9.2.4 and the current 9.3.3 beta version. >> >> And I am still trying to understand your example query. Is it correct >> that the attribute of your exampletag element have static ids, and the text >> value of the exampletag element contains an id as well? If you can provide >> me with some example documents of your database, that might help us to >> track down the problem. >> >> And feel free to check out the latest stable snapshot [1]. BaseX 9.3.3 is >> close, and lots of new optimizations and rewritings have been added since >> 9.3.2, so maybe the problem you encountered is already fixed. >> >> [1] http://files.basex.org/releases/latest/ >> >> >> >> >> On Fri, May 8, 2020 at 10:19 AM BIRKNER Michael < >> michael.birk...@akwien.at> wrote: >> >>> Hi, >>> >>> I am observing a performance loss between BaseX versions 9.2.4 (which I >>> was using so far) and 9.3.2 (to which I updated recently) when executing an >>> xQuery like this: >>> >>> --- >>> (: Open 2 databases and get all <record>s :) >>> let $recsFromDb1 := db:open('db1')/record >>> let $recsFromDb2 := db:open('db2')/record >>> >>> (: Get distinct IDs of all records in db1 :) >>> let $idsFromRecsInDb1 := >>> distinct-values($recsFromDb1/exampletag[@exampleattr='id']) >>> >>> (: Iterate over the distinct IDs of db1 and return the records from db2 >>> with the same ID :) >>> for $id in $idsFromRecsInDb1 >>> let $recFromDb2WithSameId := $recsFromDb2[ >>> exampletag[@exampleattr='id']=$id] >>> return $recFromDb2WithSameId >>> --- >>> >>> In BaseX version 9.2.4 the query executes very fast (2 - 3 seconds). In >>> 9.3.2 I didn't wait to the end ... I aborted after several minutes because >>> I suspected that something must be wrong. >>> >>> Both BaseX instances have allocated the same amount of memory (4096MB). >>> The databases (db1 and db2) were created in the respective BaseX version >>> from scratch and contain attribute and text indexes. They were optimized >>> before executing the query mentioned above. All options and preferences are >>> the same in both BaseX instances. I am using the GUI in Ubuntu 18.04. >>> >>> Here are some more details about the two databases: >>> >>> db1: >>> - Size: 2255MB >>> - Nodes: 97598775 >>> - Documents: 7489 >>> - Uptodate: true >>> >>> db2: >>> - Size: 883MB >>> - Nodes: 46317512 >>> - Documents: 1 >>> - Uptodate: true >>> >>> Does someone have an idea why there is such a difference in performance >>> between the two BaseX versions? >>> >>> Thanks for any answers and hints! >>> >>> Best regards, >>> Michael >>> >>> >>> >>> Mag. Michael Birkner >>> AK Wien - Bibliothek >>> 1040, Prinz Eugen Straße 20-22 >>> T: +43 1 501 65 12455 >>> F: +43 1 501 65 142455 >>> M: +43 664 88957669 >>> >>> michael.birk...@akwien.at <michael.birk...@akwien.at> >>> wien.arbeiterkammer.at >>> >>> Besuchen Sie uns auch auf: >>> facebook <http://www.facebook.com/arbeiterkammer/> | twitter >>> <https://twitter.com/Arbeiterkammer> | youtube >>> <https://www.youtube.com/user/AKoesterreich> >>> -------------------------------------------------- >>> >>> *Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein. Damals. Heute. >>> Für immer.* >>> >>> *arbeiterkammer.at/100 <https://arbeiterkammer.at/100>** >>> <https://arbeiterkammer.at/100>* <https://w.ak.at/zukunftsprogramm> >>> <https://arbeiterkammer.at/100> >>> Beachten Sie, dass Sie uns ab sofort unter einer geänderten Rufnummer >>> erreichen. Bitte speichern Sie gleich Ihren Kontakt zur AK Wien ein unter >>> *501 >>> 65 1*, gefolgt von der gewohnten Durchwahl. >>> Dieses Mail ist ausschließlich für die Verwendung durch die/den darin >>> genannten AdressatInnen bestimmt und kann vertrauliche bzw rechtlich >>> geschützte Informationen enthalten, deren Verwendung ohne Genehmigung durch >>> den/ die AbsenderIn rechtswidrig sein kann. >>> Falls Sie dieses Mail irrtümlich erhalten haben, informieren Sie uns >>> bitte und löschen Sie die Nachricht. >>> UID: ATU 16209706 I https://wien.arbeiterkammer.at/datenschutz >>> >>