Based on the nature of our documents, we sometimes experience extremely long response times when executing NEAR operations against a document (sometimes well over minutes - even though the operation is restricted to a single document).
Our analysis of the code indicates (we think): It looks up each of the terms in the word.dbx file. It intersects the occurrence lists. (So far so good!) It takes each gid found in the occurrence list and: finds its parent right up until the root of the document (in dom.dbx). Traverses the tree depth-first until it finds the node text of interest. Does the expected scan to find out if the term distance requirement is satisfied. We did some timings on our document (Rusticus). It started off taking < 1 second per occ and grew to 25 seconds. If we changed the dom.dbx buffers, we got significant improvement, but still relatively slow (343 occs). QUESTION: Seems to us the occs are ordered by gid (and we don't do any updating). Is there a simple way to make use of the positioning information of the tree levels for the prior occurrence on the current occurrence so that we don't have to start again from the document root? Thanks, Joe Paulsen --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
