Here's the results with snippet processing turned off  and the return-plan and 
debug on.
Its clearly in the searching not snippet processing.


I'm going to try a few other things as well but thought this might be useful 
quick feedback.
search:search( "INFO" , 
                <options xmlns="http://marklogic.com/appservices/search";>
                  <additional-query>{cts:directory-query("/logs/","infinity")}
                  </additional-query>
                  <searchable-expression>/logfile/log</searchable-expression>
                   <term>
                        <term-option>unwildcarded</term-option>
                 </term>
                 <sort-order type="dateTime" direction="ascending">
                      <element ns="" name="log"/>
                    <attribute ns="" name="time"/>
                 </sort-order>
             <empty apply="no-results" />
               <return-plan>true</return-plan> 
               <debug>true</debug>
                <transform-results apply="raw">
                </transform-results>
                </options>
                , 1 , 10 )

------ Result 



<search:response total="0" start="1" page-length="10" xmlns="" 
xmlns:search="http://marklogic.com/appservices/search";>
<search:plan>
<qry:query-plan xmlns:qry="http://marklogic.com/cts/query";>
<qry:info-trace>xdmp:value(xs:untypedAtomic("xdmp:plan(cts:search(/logfile/log, 
cts:and-query((cts:word-query..."))</qry:info-trace>
<qry:info-trace>Analyzing path for search: 
fn:collection()/logfile/log</qry:info-trace>
<qry:info-trace>Step 1 is searchable: fn:collection()</qry:info-trace>
<qry:info-trace>Step 2 is searchable: logfile</qry:info-trace>
<qry:info-trace>Step 3 is searchable: log</qry:info-trace>
<qry:info-trace>Path is fully searchable.</qry:info-trace>
<qry:info-trace>Gathering constraints.</qry:info-trace>
<qry:word-trace text="INFO">
<qry:key>9355958571458624353</qry:key>
</qry:word-trace>
<qry:info-trace>Search query contributed 2 constraints: 
cts:and-query((cts:word-query("INFO", ("unwildcarded","lang=en"), 1), 
cts:directory-query("/logs/", "infinity")), ())</qry:info-trace>
<qry:partial-plan>
<qry:term-query weight="1">
<qry:key>9355958571458624353</qry:key>
</qry:term-query>
</qry:partial-plan>
<qry:partial-plan>
<qry:term-query weight="0">
<qry:key>7047720329996541787</qry:key>
</qry:term-query>
</qry:partial-plan>
<qry:info-trace>Executing search.</qry:info-trace>
<qry:final-plan>
<qry:and-query>
<qry:term-query weight="0">
<qry:key>16738249755881875161</qry:key>
</qry:term-query>
<qry:or-two-queries>
<qry:term-query weight="0">
<qry:key>4112558586381539294</qry:key>
</qry:term-query>
<qry:term-query weight="0">
<qry:key>16457039686800503886</qry:key>
</qry:term-query>
</qry:or-two-queries>
<qry:term-query weight="1">
<qry:key>9355958571458624353</qry:key>
</qry:term-query>
<qry:term-query weight="0">
<qry:key>7047720329996541787</qry:key>
</qry:term-query>
</qry:and-query>
</qry:final-plan>
<qry:info-trace>Selected 308943 fragments to filter</qry:info-trace>
<qry:result estimate="308943"/>
</qry:query-plan>
</search:plan>
<search:qtext>INFO</search:qtext>
<search:report id="SEARCH-SCHEMAINVALID">
<error:format-string 
xmlns:error="http://marklogic.com/xdmp/error";>XDMP-VALIDATEUNEXPECTED: 
(err:XQDY0027) validate strict { $opt } -- Invalid node: Found search:empty but 
expected 
(search:additional-query|search:annotation|search:concurrency-level|search:constraint|search:debug|search:default-suggestion-source|search:forest|search:grammar|search:operator|search:page-length|search:quality-weight|search:return-constraints|search:return-facets|search:return-metrics|search:return-plan|search:return-qtext|search:return-query|search:return-results|search:return-similar|search:search-option|search:searchable-expression|search:sort-order|search:suggestion-source|search:term|search:transform-results)*
 at /search:options/search:empty using schema "search.xsd"</error:format-string>
<error:data xmlns:error="http://marklogic.com/xdmp/error";>
<error:datum>search:empty</error:datum>
<error:datum>(search:additional-query|search:annotation|search:concurrency-level|search:constraint|search:debug|search:default-suggestion-source|search:forest|search:grammar|search:operator|search:page-length|search:quality-weight|search:return-constraints|search:return-facets|search:return-metrics|search:return-plan|search:return-qtext|search:return-query|search:return-results|search:return-similar|search:search-option|search:searchable-expression|search:sort-order|search:suggestion-source|search:term|search:transform-results)*</error:datum>
<error:datum>/search:options/search:empty</error:datum>
<error:datum>"search.xsd"</error:datum>
</error:data>
</search:report>
<search:report id="SEARCH-FLWOR">(for $result in cts:search(/logfile/log, 
cts:and-query((cts:word-query("INFO", ("unwildcarded","lang=en"), 1), 
cts:directory-query("/logs/", "infinity")), ()), ("score-logtfidf"), 1) order 
by (($result//log/@time)[1]) ascending return $result)[1 to 10]</search:report>
<search:metrics>
<search:query-resolution-time>PT2M16.717897S</search:query-resolution-time>
<search:facet-resolution-time>PT0.000086S</search:facet-resolution-time>
<search:snippet-resolution-time>PT0S</search:snippet-resolution-time>
<search:total-time>PT2M16.875346S</search:total-time>
</search:metrics>
</search:response>




----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected]
812-482-5224


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Micah Dubinko
Sent: Monday, July 11, 2011 12:45 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is "INFO" magic in search:search ? bad 
performance

Another thing you should do is include <return-plan>true</return-plan> in the 
options node, which will give a full readout of the query plan in the response. 
Also <debug>true</debug> will provide some more details, including the actual 
XQuery that implements the query. I'm guessing the word "info" appears very 
often in the data, and resolving the query might be more work than either of us 
suspects at this point.

-m



On Jul 11, 2011, at 9:06 AM, Michael Blakeley wrote:

> A call to xdmp:value() could be doing almost anything. But my guess is 
> snippet generation and highlighting. Going further out on a limb, it may have 
> something to do with XPath across fragment boundaries. If the snippet code 
> tries to go up to the '/logfile' root element, it will be loading a fairly 
> large fragment with many (millions of?) child links. That could get ugly, 
> especially if the working set is too large for the CPU's on-die caches.
> 
> You could test that theory by turning off snippet display, and by creating 
> some logfile documents of various sizes. If I'm on the right track, you'll 
> see that elapsed time is related to the number of fragments. I don't know if 
> it will be O(n) or something worse, though.
> 
> I would consider making '/log' the root element, with no subfragments. If you 
> have metadata in the logfile, you could represent that with a directory 
> structure under /logs/, and perhaps have a metadata document with a known 
> base URI inside each directory. Or you could repeat the metadata in each 
> log-entry document. But I would try to get away from using sub-document 
> fragments.
> 
> -- Mike
> 
> On 11 Jul 2011, at 07:55 , Lee, David wrote:
> 
>> I'm playing with search:search
>> I have about 4GB of data (few million xml files)
>> Most searchs  return with a second or 2 but something magic is happening if 
>> I use the word "INFO"
>> 
>> 
>> With the following search it takes nearly 3 minutes and returns no results.
>> I tried other words that both return results and no results and few results 
>> and can only replicate it with the magic word "INFO"
>> 
>> search:search( "INFO" ,
>>                                <options 
>> xmlns="http://marklogic.com/appservices/search";>
>>                                  
>> <additional-query>{cts:directory-query("/logs/","infinity")}
>>                                  </additional-query>
>>                                  
>> <searchable-expression>/logfile/log</searchable-expression>
>>                                </options>
>>                                , 1 , 10 )
>> 
>> 
>> Profiling shows the majority of time is spent in xdmp:value
>> 
>> MarkLogic/appservices/utils/higher-order.xqy:   52
>> xdmp:value($expr/hof:lambda/@expr)
>> 1
>> 100
>> 124556878
>> 100
>> 124556880
>> 
>> 
>> thats 124 seconds.  Everything else is noise (mostly under 100 us).
>> 
>> Any ideas ?
>> 
>> 
>> ----------------------------------------
>> David A. Lee
>> Senior Principal Software Engineer
>> Epocrates, Inc.
>> [email protected]
>> 812-482-5224
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to