Thanks Jason, adding the word and element position indexes dropped this query down to about 500ms. Thanks!
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jason Hunter Sent: Monday, December 21, 2009 12:43 AM To: General Mark Logic Developer Discussion Subject: Re: [MarkLogic Dev General] Help with slow query .... On Dec 20, 2009, at 8:32 PM, Lee, David wrote: > I've run into an interesting case of a very slow query. > In the DB I have I have about 600,000 fragments (in about 6000 files). > These are small fragments (about 300 bytes) with about 10 very short elements containing a short string or nothing. > In MOST searches I get about 100ms result times but this one takes about 60 seconds You can catch the query-trace of it to see how (and how well) the filtering is being applied. > cts:search( > xdmp:directory("/RxNorm/rxnconso/")//RXNCONSO , > cts:element-query( xs:QName("STR") , > cts:word-query( "ENG", ("case-insensitive", "diacritic-sensitive", > "punctuation-insensitive", "whitespace-insensitive", "unstemmed","wildcarded") ) ) )[1 to 10] > > What I think is going on here is that the term "ENG" is in every single fragment (its a language code), so its finding 600,000 fragments > but I'm constructing a search to limit the search to only "STR" elements, of which none contain "ENG". > My guess as to what is happening is that ML is finding a "hit" in every fragment, but has to open up the fragment > and search to discover that the hit was in the wrong element. The result is the empty sequence. > but it takes a minute to get to that. The docs on cts:element-query() explain which indexes can help it do its job: "Enabling both the word position and element position indexes ("word position" and "element word position" in the database configuration screen of the Admin Interface) will speed up query performance for many queries that use cts:element-query. The position indexes enable MarkLogic Server to eliminate many false-positive results, which can reduce disk I/O and processing, thereby speeding the performance of many queries. The amount of benefit will vary depending on your data." Sounds like that's the most likely candidate; you don't have these indexes so the query is seeing those false positives that appear in other elements. -jh- _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
