I added
<search-option>unfiltered</search-option>
and now the result is almost instant (.14 sec).
I need to ponder what thats affecting ...
In the mean time also experimenting with no sub-document fragmentation (but
will be a while to reload the DB with 4G of new data!)
----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected]
812-482-5224
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Lee, David
Sent: Monday, July 11, 2011 2:58 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is "INFO" magic in search:search ? bad
performance
Changing to //log didn't effect things much (slightly faster) but I'm still
concerned about the error in the search:report
It reports that 308943 fragments were found but then reports none of them.
I suspect the hint about fragmentation is correct. I'm trying some new code
that splits up the files into seperate documents.
----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected]
812-482-5224
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: Monday, July 11, 2011 2:27 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is "INFO" magic in search:search ? bad
performance
If it isn't already on your list, you might try changing the searchable
expression to //log or turn off filtering entirely. I'm still thinking about
that parent fragment, and filtering can also cross fragment boundaries.
-- Mike
On 11 Jul 2011, at 11:12 , Lee, David wrote:
> Here's the results with snippet processing turned off and the return-plan
> and debug on.
> Its clearly in the searching not snippet processing.
>
>
> I'm going to try a few other things as well but thought this might be useful
> quick feedback.
> search:search( "INFO" ,
> <options xmlns="http://marklogic.com/appservices/search">
> <additional-query>{cts:directory-query("/logs/","infinity")}
> </additional-query>
> <searchable-expression>/logfile/log</searchable-expression>
> <term>
> <term-option>unwildcarded</term-option>
> </term>
> <sort-order type="dateTime" direction="ascending">
> <element ns="" name="log"/>
> <attribute ns="" name="time"/>
> </sort-order>
> <empty apply="no-results" />
> <return-plan>true</return-plan>
> <debug>true</debug>
> <transform-results apply="raw">
> </transform-results>
> </options>
> , 1 , 10 )
>
> ------ Result
>
>
>
> <search:response total="0" start="1" page-length="10" xmlns=""
> xmlns:search="http://marklogic.com/appservices/search">
> <search:plan>
> <qry:query-plan xmlns:qry="http://marklogic.com/cts/query">
> <qry:info-trace>xdmp:value(xs:untypedAtomic("xdmp:plan(cts:search(/logfile/log,
> cts:and-query((cts:word-query..."))</qry:info-trace>
> <qry:info-trace>Analyzing path for search:
> fn:collection()/logfile/log</qry:info-trace>
> <qry:info-trace>Step 1 is searchable: fn:collection()</qry:info-trace>
> <qry:info-trace>Step 2 is searchable: logfile</qry:info-trace>
> <qry:info-trace>Step 3 is searchable: log</qry:info-trace>
> <qry:info-trace>Path is fully searchable.</qry:info-trace>
> <qry:info-trace>Gathering constraints.</qry:info-trace>
> <qry:word-trace text="INFO">
> <qry:key>9355958571458624353</qry:key>
> </qry:word-trace>
> <qry:info-trace>Search query contributed 2 constraints:
> cts:and-query((cts:word-query("INFO", ("unwildcarded","lang=en"), 1),
> cts:directory-query("/logs/", "infinity")), ())</qry:info-trace>
> <qry:partial-plan>
> <qry:term-query weight="1">
> <qry:key>9355958571458624353</qry:key>
> </qry:term-query>
> </qry:partial-plan>
> <qry:partial-plan>
> <qry:term-query weight="0">
> <qry:key>7047720329996541787</qry:key>
> </qry:term-query>
> </qry:partial-plan>
> <qry:info-trace>Executing search.</qry:info-trace>
> <qry:final-plan>
> <qry:and-query>
> <qry:term-query weight="0">
> <qry:key>16738249755881875161</qry:key>
> </qry:term-query>
> <qry:or-two-queries>
> <qry:term-query weight="0">
> <qry:key>4112558586381539294</qry:key>
> </qry:term-query>
> <qry:term-query weight="0">
> <qry:key>16457039686800503886</qry:key>
> </qry:term-query>
> </qry:or-two-queries>
> <qry:term-query weight="1">
> <qry:key>9355958571458624353</qry:key>
> </qry:term-query>
> <qry:term-query weight="0">
> <qry:key>7047720329996541787</qry:key>
> </qry:term-query>
> </qry:and-query>
> </qry:final-plan>
> <qry:info-trace>Selected 308943 fragments to filter</qry:info-trace>
> <qry:result estimate="308943"/>
> </qry:query-plan>
> </search:plan>
> <search:qtext>INFO</search:qtext>
> <search:report id="SEARCH-SCHEMAINVALID">
> <error:format-string
> xmlns:error="http://marklogic.com/xdmp/error">XDMP-VALIDATEUNEXPECTED:
> (err:XQDY0027) validate strict { $opt } -- Invalid node: Found search:empty
> but expected
> (search:additional-query|search:annotation|search:concurrency-level|search:constraint|search:debug|search:default-suggestion-source|search:forest|search:grammar|search:operator|search:page-length|search:quality-weight|search:return-constraints|search:return-facets|search:return-metrics|search:return-plan|search:return-qtext|search:return-query|search:return-results|search:return-similar|search:search-option|search:searchable-expression|search:sort-order|search:suggestion-source|search:term|search:transform-results)*
> at /search:options/search:empty using schema
> "search.xsd"</error:format-string>
> <error:data xmlns:error="http://marklogic.com/xdmp/error">
> <error:datum>search:empty</error:datum>
> <error:datum>(search:additional-query|search:annotation|search:concurrency-level|search:constraint|search:debug|search:default-suggestion-source|search:forest|search:grammar|search:operator|search:page-length|search:quality-weight|search:return-constraints|search:return-facets|search:return-metrics|search:return-plan|search:return-qtext|search:return-query|search:return-results|search:return-similar|search:search-option|search:searchable-expression|search:sort-order|search:suggestion-source|search:term|search:transform-results)*</error:datum>
> <error:datum>/search:options/search:empty</error:datum>
> <error:datum>"search.xsd"</error:datum>
> </error:data>
> </search:report>
> <search:report id="SEARCH-FLWOR">(for $result in cts:search(/logfile/log,
> cts:and-query((cts:word-query("INFO", ("unwildcarded","lang=en"), 1),
> cts:directory-query("/logs/", "infinity")), ()), ("score-logtfidf"), 1) order
> by (($result//log/@time)[1]) ascending return $result)[1 to
> 10]</search:report>
> <search:metrics>
> <search:query-resolution-time>PT2M16.717897S</search:query-resolution-time>
> <search:facet-resolution-time>PT0.000086S</search:facet-resolution-time>
> <search:snippet-resolution-time>PT0S</search:snippet-resolution-time>
> <search:total-time>PT2M16.875346S</search:total-time>
> </search:metrics>
> </search:response>
>
>
>
>
> ----------------------------------------
> David A. Lee
> Senior Principal Software Engineer
> Epocrates, Inc.
> [email protected]
> 812-482-5224
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Micah Dubinko
> Sent: Monday, July 11, 2011 12:45 PM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Is "INFO" magic in search:search ? bad
> performance
>
> Another thing you should do is include <return-plan>true</return-plan> in the
> options node, which will give a full readout of the query plan in the
> response. Also <debug>true</debug> will provide some more details, including
> the actual XQuery that implements the query. I'm guessing the word "info"
> appears very often in the data, and resolving the query might be more work
> than either of us suspects at this point.
>
> -m
>
>
>
> On Jul 11, 2011, at 9:06 AM, Michael Blakeley wrote:
>
>> A call to xdmp:value() could be doing almost anything. But my guess is
>> snippet generation and highlighting. Going further out on a limb, it may
>> have something to do with XPath across fragment boundaries. If the snippet
>> code tries to go up to the '/logfile' root element, it will be loading a
>> fairly large fragment with many (millions of?) child links. That could get
>> ugly, especially if the working set is too large for the CPU's on-die caches.
>>
>> You could test that theory by turning off snippet display, and by creating
>> some logfile documents of various sizes. If I'm on the right track, you'll
>> see that elapsed time is related to the number of fragments. I don't know if
>> it will be O(n) or something worse, though.
>>
>> I would consider making '/log' the root element, with no subfragments. If
>> you have metadata in the logfile, you could represent that with a directory
>> structure under /logs/, and perhaps have a metadata document with a known
>> base URI inside each directory. Or you could repeat the metadata in each
>> log-entry document. But I would try to get away from using sub-document
>> fragments.
>>
>> -- Mike
>>
>> On 11 Jul 2011, at 07:55 , Lee, David wrote:
>>
>>> I'm playing with search:search
>>> I have about 4GB of data (few million xml files)
>>> Most searchs return with a second or 2 but something magic is happening if
>>> I use the word "INFO"
>>>
>>>
>>> With the following search it takes nearly 3 minutes and returns no results.
>>> I tried other words that both return results and no results and few results
>>> and can only replicate it with the magic word "INFO"
>>>
>>> search:search( "INFO" ,
>>> <options
>>> xmlns="http://marklogic.com/appservices/search">
>>>
>>> <additional-query>{cts:directory-query("/logs/","infinity")}
>>> </additional-query>
>>>
>>> <searchable-expression>/logfile/log</searchable-expression>
>>> </options>
>>> , 1 , 10 )
>>>
>>>
>>> Profiling shows the majority of time is spent in xdmp:value
>>>
>>> MarkLogic/appservices/utils/higher-order.xqy: 52
>>> xdmp:value($expr/hof:lambda/@expr)
>>> 1
>>> 100
>>> 124556878
>>> 100
>>> 124556880
>>>
>>>
>>> thats 124 seconds. Everything else is noise (mostly under 100 us).
>>>
>>> Any ideas ?
>>>
>>>
>>> ----------------------------------------
>>> David A. Lee
>>> Senior Principal Software Engineer
>>> Epocrates, Inc.
>>> [email protected]
>>> 812-482-5224
>>>
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general