Cant I have it both ways :) 

I don't have an answer yet as I'm still in the prototype stage, I'm mainly 
trying to figure out what the tradeoffs are and how to resolve them.   I think 
I have a handle on it now.   I'm going to complete the re-fragmenting work 
(move every node to its own document) but I so far that doesn't appear to be 
effecting things much.    Its *much* simpler (and faster to load, easier to 
manage, delete etc) to have 100 files then 1,000,000 files ... so I'd rather 
not have to split these up, but I'm trying as an experiment.

Rather the  filtering seems to be the key bottleneck ... if I mis-match search 
options with index options and things and I get a huge number of 
false-positives then the filtering stage takes forever.   Now that I know the 
root cause I can work with the tradeoffs.

Thanks everyone for your help ! I'm sure to be back with (N+1) questions ... 


----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected]
812-482-5224


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Kelly Stirman
Sent: Monday, July 11, 2011 3:57 PM
To: [email protected]
Subject: Re: [MarkLogic Dev General] Is "INFO" magic in search:search ? bad 
performance

Hi David,

I think we had you disable all the search indexes to speed loading.

Now you're searching. :-)

The answer in this case is to enable a case-sensitive index. But rather than 
add indexes one at a time, do you know what kind of searches you need to run? 
That will help us to recommend the minimal index configuration so you can have 
fast loads and fast searches.

Kelly

Message: 3
Date: Mon, 11 Jul 2011 19:35:10 +0000
From: "Lee, David" <[email protected]>
Subject: Re: [MarkLogic Dev General] Is "INFO"  magic   in      search:search
        ?       bad     performance
To: General MarkLogic Developer Discussion
        <[email protected]>
Message-ID:
        <31395bf86e0a454f832b8f8824ed6bda034...@exmb-pp03.corp.epocrates.com>
Content-Type: text/plain; charset="us-ascii"

I narrowed this down.
I changed the data to be single documents instead of a top-level with a 
fragment-root.
The effects were not significantly different.

Adding 'unfiltered' made the results instant.
Similarly adding 'case-insensitive' made the results instant.

So what I think is going on here is that "INFO" matches case-insensitively to a 
huge number of fragments but case-sensitively to none.   Thus the unfiltered 
search has to go through the entire list by loading the fragments and doing the 
equivilent of a cts:contains() to find no results.

If "INFO" matched more results case-sensitive then the search could have 
terminated quicker as I see with other such search of words that have both case 
sensitive and insensitive matches.

Suggestions to improve this usably ? 

A) add a case-sensitive index ?
B) Use case-insensitive matching (might be the better GUI in this case).
C) Use un-filtered searches but then I can get a HUGE number of false-positives 
... running cts:contains on the top 10 hits will likely find none so this is 
not usable.

Other ideas ?


----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected]
812-482-5224
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to