If it isn't already on your list, you might try changing the searchable expression to //log or turn off filtering entirely. I'm still thinking about that parent fragment, and filtering can also cross fragment boundaries.
-- Mike On 11 Jul 2011, at 11:12 , Lee, David wrote: > Here's the results with snippet processing turned off and the return-plan > and debug on. > Its clearly in the searching not snippet processing. > > > I'm going to try a few other things as well but thought this might be useful > quick feedback. > search:search( "INFO" , > <options xmlns="http://marklogic.com/appservices/search"> > <additional-query>{cts:directory-query("/logs/","infinity")} > </additional-query> > <searchable-expression>/logfile/log</searchable-expression> > <term> > <term-option>unwildcarded</term-option> > </term> > <sort-order type="dateTime" direction="ascending"> > <element ns="" name="log"/> > <attribute ns="" name="time"/> > </sort-order> > <empty apply="no-results" /> > <return-plan>true</return-plan> > <debug>true</debug> > <transform-results apply="raw"> > </transform-results> > </options> > , 1 , 10 ) > > ------ Result > > > > <search:response total="0" start="1" page-length="10" xmlns="" > xmlns:search="http://marklogic.com/appservices/search"> > <search:plan> > <qry:query-plan xmlns:qry="http://marklogic.com/cts/query"> > <qry:info-trace>xdmp:value(xs:untypedAtomic("xdmp:plan(cts:search(/logfile/log, > cts:and-query((cts:word-query..."))</qry:info-trace> > <qry:info-trace>Analyzing path for search: > fn:collection()/logfile/log</qry:info-trace> > <qry:info-trace>Step 1 is searchable: fn:collection()</qry:info-trace> > <qry:info-trace>Step 2 is searchable: logfile</qry:info-trace> > <qry:info-trace>Step 3 is searchable: log</qry:info-trace> > <qry:info-trace>Path is fully searchable.</qry:info-trace> > <qry:info-trace>Gathering constraints.</qry:info-trace> > <qry:word-trace text="INFO"> > <qry:key>9355958571458624353</qry:key> > </qry:word-trace> > <qry:info-trace>Search query contributed 2 constraints: > cts:and-query((cts:word-query("INFO", ("unwildcarded","lang=en"), 1), > cts:directory-query("/logs/", "infinity")), ())</qry:info-trace> > <qry:partial-plan> > <qry:term-query weight="1"> > <qry:key>9355958571458624353</qry:key> > </qry:term-query> > </qry:partial-plan> > <qry:partial-plan> > <qry:term-query weight="0"> > <qry:key>7047720329996541787</qry:key> > </qry:term-query> > </qry:partial-plan> > <qry:info-trace>Executing search.</qry:info-trace> > <qry:final-plan> > <qry:and-query> > <qry:term-query weight="0"> > <qry:key>16738249755881875161</qry:key> > </qry:term-query> > <qry:or-two-queries> > <qry:term-query weight="0"> > <qry:key>4112558586381539294</qry:key> > </qry:term-query> > <qry:term-query weight="0"> > <qry:key>16457039686800503886</qry:key> > </qry:term-query> > </qry:or-two-queries> > <qry:term-query weight="1"> > <qry:key>9355958571458624353</qry:key> > </qry:term-query> > <qry:term-query weight="0"> > <qry:key>7047720329996541787</qry:key> > </qry:term-query> > </qry:and-query> > </qry:final-plan> > <qry:info-trace>Selected 308943 fragments to filter</qry:info-trace> > <qry:result estimate="308943"/> > </qry:query-plan> > </search:plan> > <search:qtext>INFO</search:qtext> > <search:report id="SEARCH-SCHEMAINVALID"> > <error:format-string > xmlns:error="http://marklogic.com/xdmp/error">XDMP-VALIDATEUNEXPECTED: > (err:XQDY0027) validate strict { $opt } -- Invalid node: Found search:empty > but expected > (search:additional-query|search:annotation|search:concurrency-level|search:constraint|search:debug|search:default-suggestion-source|search:forest|search:grammar|search:operator|search:page-length|search:quality-weight|search:return-constraints|search:return-facets|search:return-metrics|search:return-plan|search:return-qtext|search:return-query|search:return-results|search:return-similar|search:search-option|search:searchable-expression|search:sort-order|search:suggestion-source|search:term|search:transform-results)* > at /search:options/search:empty using schema > "search.xsd"</error:format-string> > <error:data xmlns:error="http://marklogic.com/xdmp/error"> > <error:datum>search:empty</error:datum> > <error:datum>(search:additional-query|search:annotation|search:concurrency-level|search:constraint|search:debug|search:default-suggestion-source|search:forest|search:grammar|search:operator|search:page-length|search:quality-weight|search:return-constraints|search:return-facets|search:return-metrics|search:return-plan|search:return-qtext|search:return-query|search:return-results|search:return-similar|search:search-option|search:searchable-expression|search:sort-order|search:suggestion-source|search:term|search:transform-results)*</error:datum> > <error:datum>/search:options/search:empty</error:datum> > <error:datum>"search.xsd"</error:datum> > </error:data> > </search:report> > <search:report id="SEARCH-FLWOR">(for $result in cts:search(/logfile/log, > cts:and-query((cts:word-query("INFO", ("unwildcarded","lang=en"), 1), > cts:directory-query("/logs/", "infinity")), ()), ("score-logtfidf"), 1) order > by (($result//log/@time)[1]) ascending return $result)[1 to > 10]</search:report> > <search:metrics> > <search:query-resolution-time>PT2M16.717897S</search:query-resolution-time> > <search:facet-resolution-time>PT0.000086S</search:facet-resolution-time> > <search:snippet-resolution-time>PT0S</search:snippet-resolution-time> > <search:total-time>PT2M16.875346S</search:total-time> > </search:metrics> > </search:response> > > > > > ---------------------------------------- > David A. Lee > Senior Principal Software Engineer > Epocrates, Inc. > [email protected] > 812-482-5224 > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Micah Dubinko > Sent: Monday, July 11, 2011 12:45 PM > To: General MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Is "INFO" magic in search:search ? bad > performance > > Another thing you should do is include <return-plan>true</return-plan> in the > options node, which will give a full readout of the query plan in the > response. Also <debug>true</debug> will provide some more details, including > the actual XQuery that implements the query. I'm guessing the word "info" > appears very often in the data, and resolving the query might be more work > than either of us suspects at this point. > > -m > > > > On Jul 11, 2011, at 9:06 AM, Michael Blakeley wrote: > >> A call to xdmp:value() could be doing almost anything. But my guess is >> snippet generation and highlighting. Going further out on a limb, it may >> have something to do with XPath across fragment boundaries. If the snippet >> code tries to go up to the '/logfile' root element, it will be loading a >> fairly large fragment with many (millions of?) child links. That could get >> ugly, especially if the working set is too large for the CPU's on-die caches. >> >> You could test that theory by turning off snippet display, and by creating >> some logfile documents of various sizes. If I'm on the right track, you'll >> see that elapsed time is related to the number of fragments. I don't know if >> it will be O(n) or something worse, though. >> >> I would consider making '/log' the root element, with no subfragments. If >> you have metadata in the logfile, you could represent that with a directory >> structure under /logs/, and perhaps have a metadata document with a known >> base URI inside each directory. Or you could repeat the metadata in each >> log-entry document. But I would try to get away from using sub-document >> fragments. >> >> -- Mike >> >> On 11 Jul 2011, at 07:55 , Lee, David wrote: >> >>> I'm playing with search:search >>> I have about 4GB of data (few million xml files) >>> Most searchs return with a second or 2 but something magic is happening if >>> I use the word "INFO" >>> >>> >>> With the following search it takes nearly 3 minutes and returns no results. >>> I tried other words that both return results and no results and few results >>> and can only replicate it with the magic word "INFO" >>> >>> search:search( "INFO" , >>> <options >>> xmlns="http://marklogic.com/appservices/search"> >>> >>> <additional-query>{cts:directory-query("/logs/","infinity")} >>> </additional-query> >>> >>> <searchable-expression>/logfile/log</searchable-expression> >>> </options> >>> , 1 , 10 ) >>> >>> >>> Profiling shows the majority of time is spent in xdmp:value >>> >>> MarkLogic/appservices/utils/higher-order.xqy: 52 >>> xdmp:value($expr/hof:lambda/@expr) >>> 1 >>> 100 >>> 124556878 >>> 100 >>> 124556880 >>> >>> >>> thats 124 seconds. Everything else is noise (mostly under 100 us). >>> >>> Any ideas ? >>> >>> >>> ---------------------------------------- >>> David A. Lee >>> Senior Principal Software Engineer >>> Epocrates, Inc. >>> [email protected] >>> 812-482-5224 >>> >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://developer.marklogic.com/mailman/listinfo/general >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
