Cant I have it both ways :) I don't have an answer yet as I'm still in the prototype stage, I'm mainly trying to figure out what the tradeoffs are and how to resolve them. I think I have a handle on it now. I'm going to complete the re-fragmenting work (move every node to its own document) but I so far that doesn't appear to be effecting things much. Its *much* simpler (and faster to load, easier to manage, delete etc) to have 100 files then 1,000,000 files ... so I'd rather not have to split these up, but I'm trying as an experiment.
Rather the filtering seems to be the key bottleneck ... if I mis-match search options with index options and things and I get a huge number of false-positives then the filtering stage takes forever. Now that I know the root cause I can work with the tradeoffs. Thanks everyone for your help ! I'm sure to be back with (N+1) questions ... ---------------------------------------- David A. Lee Senior Principal Software Engineer Epocrates, Inc. [email protected] 812-482-5224 -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Kelly Stirman Sent: Monday, July 11, 2011 3:57 PM To: [email protected] Subject: Re: [MarkLogic Dev General] Is "INFO" magic in search:search ? bad performance Hi David, I think we had you disable all the search indexes to speed loading. Now you're searching. :-) The answer in this case is to enable a case-sensitive index. But rather than add indexes one at a time, do you know what kind of searches you need to run? That will help us to recommend the minimal index configuration so you can have fast loads and fast searches. Kelly Message: 3 Date: Mon, 11 Jul 2011 19:35:10 +0000 From: "Lee, David" <[email protected]> Subject: Re: [MarkLogic Dev General] Is "INFO" magic in search:search ? bad performance To: General MarkLogic Developer Discussion <[email protected]> Message-ID: <31395bf86e0a454f832b8f8824ed6bda034...@exmb-pp03.corp.epocrates.com> Content-Type: text/plain; charset="us-ascii" I narrowed this down. I changed the data to be single documents instead of a top-level with a fragment-root. The effects were not significantly different. Adding 'unfiltered' made the results instant. Similarly adding 'case-insensitive' made the results instant. So what I think is going on here is that "INFO" matches case-insensitively to a huge number of fragments but case-sensitively to none. Thus the unfiltered search has to go through the entire list by loading the fragments and doing the equivilent of a cts:contains() to find no results. If "INFO" matched more results case-sensitive then the search could have terminated quicker as I see with other such search of words that have both case sensitive and insensitive matches. Suggestions to improve this usably ? A) add a case-sensitive index ? B) Use case-insensitive matching (might be the better GUI in this case). C) Use un-filtered searches but then I can get a HUGE number of false-positives ... running cts:contains on the top 10 hits will likely find none so this is not usable. Other ideas ? ---------------------------------------- David A. Lee Senior Principal Software Engineer Epocrates, Inc. [email protected] 812-482-5224 _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
