Hi Graydon, The query for the patient weight you shared works correctly and completes significantly faster (in 15 seconds).
Thanks! Ron On March 10, 2021 at 3:50:17 PM, Graydon Saunders ([email protected]) wrote: I am sure someone more knowledgable about BaseX specifics will answer, too, but in general, you have to try not to step on the optimizer's feet. 1. don't call db:open() twice; assign it to a variable, let $sourceData as document-node()+ := db:open('FAERS') and then use the variable. 2. don't do math in the binding expression of a for clause! Also, don't do math twice. Stick things in variables. 3. grouping is more efficient than distinct-values So maybe for $weight in $sourceData/ichicsr/safetyreport/patient/patientweight let $numeric as xs:integer := (($weight div 10.0) => round()) * 10 group by $numeric return concat($numeric, count($weight)) Not tested; I just typed that in, and I might not have understood what you were trying to do. But take advantage of the tuple stream processing whenever you can. -- Graydon On Wed, Mar 10, 2021 at 3:37 PM Ron Katriel <[email protected]> wrote: > Hi, > > I would appreciate your advice on optimizing a query against a large BaseX > (9.2.4) database. It is loaded with data from the FDA’s Adverse Event > Reporting System (FAERS). Currently this is just the 2020 dataset which > comprises 12 documents stored as 308,870,597 nodes (6,565 MB). > > The queries below effectively - though not necessarily efficiently - > implement a histogram. The first, which is applied to patient gender (sex), > returns the results (3 items) in 52 seconds: > > 2 893694 > 1 583999 > 0 198 > > The second does this for patient weight - rounded to the closest 10 lbs > increment. It takes 580 seconds to place the data into 67 bins. Initially I > tried running it on the rounded weights but aborted the run as it was a > taking an inordinate amount of time (there are 217 distinct weights in the > dataset). > > Is there a way to improve the performance of this type of query? > > Thanks, > Ron > > > (: 3 items - 52 sec :) > let $safetyreport := db:open('FAERS')/ichicsr/safetyreport > for $value in distinct-values($safetyreport/patient/patientsex) > return concat($value, " ", > count(index-of($safetyreport/patient/patientsex, $value))) > > (: 67 items - 580 sec :) > let $safetyreport := db:open('FAERS')/ichicsr/safetyreport > for $value in distinct-values($safetyreport/patient/patientweight ! (. div > 10.0) ! round(.) ! (. * 10)) > return concat($value, " ", > count(index-of($safetyreport/patient/patientweight ! (. div 10.0) ! > round(.) ! (. * 10), $value))) > > > > ------------------------------ > The information in this email and any attachments are intended solely for > the recipient(s) to whom it is addressed, and may be confidential and/or > privileged. Any unauthorized distribution or copying of this transmittal or > its attachments is prohibited. If you are not a named recipient or have > received this email in error: (i) you should not read, disclose, or copy > it, (ii) please notify the sender of your receipt by reply email and delete > this email and all attachments. > -- The information in this email and any attachments are intended solely for the recipient(s) to whom it is addressed, and may be confidential and/or privileged. Any unauthorized distribution or copying of this transmittal or its attachments is prohibited. If you are not a named recipient or have received this email in error: (i) you should not read, disclose, or copy it, (ii) please notify the sender of your receipt by reply email and delete this email and all attachments.

