As always, a big thank you. Adding the global lexicon collation that matched the default application lexicon collation resolved the problem on my local machine - I haven't been able to test this on production, but there's no reason to think that it wouldn't.
I will also advise that the recommended index settings are enabled (some, but not all, were already) Thanks again, Dec On 10 Dec 2012, at 18:13, Michael Blakeley <[email protected]> wrote: > That sounds like the index configuration interacting with filtered search. > The total is the unfiltered result of the index lookup, and includes some > false positives. The final results are filtered. You could check this by > adding 'unfiltered' to your search options. > > http://docs.marklogic.com/guide/search-dev/wildcard discusses an extreme > situation, where there are no wildcard indexes at all: > >> If character indexes, lexicons, and trailing wildcard indexes are all >> disabled in a database and wildcarding is explicitly enabled in the query >> (with the "wildcarded"option to the leaf-level cts:query constructor), the >> query will execute, but might require a lot of processing. Such queries will >> be fast if they are very selective and only need to do the wildcard searches >> over a relatively small amount of content, but can take a long time if they >> actually need to filter out results from a large amount of content. > > > With any wildcard configuration, some queries will be easy to resolve and > some will be harder. The docs make a recommendation that works pretty well > for most cases: > >> To enable any kind of wildcard query functionality with a good combination >> of performance and database size, MarkLogic recommends the following index >> settings: >> >> • word searches >> • three character word searches >> • word positions >> • word lexicon in the codepoint collation >> • three character word positions >> This combination will provide accurate and fast wildcard queries for a wide >> variety of wildcard searches, including leading and trailing wildcarded >> searches. If you add the trailing wildcard searches index, you will get >> slightly more efficient trailing wildcard searches, but with increased >> database size. > > > -- Mike > > On 10 Dec 2012, at 09:15 , Declan Newman <[email protected]> wrote: > >> We are seeing some odd behaviour with the counts coming back when using >> wildcard searches when using search:search. >> >> I have recreated this on a local ML instance using the flowing: >> >> /Document_1.xml - contains "Sons Ltd." >> /Document_2.xml - contains "Sonss Ltd. >> /Document_3.xml - contains "Sonas Ltd. >> >> >> search:search("son?") => total="1" … along with the correct <search-result/> >> etc. >> … which is obviously correct. >> >> However: >> search:search("son??") => total="3" … along with the two matching >> <search-result/> for "Sonss Ltd." and "Sonas Ltd." >> >> It has correctly identified and returned the search results (and snippets) >> but the count is wrongly including the "Sons" as a match. >> >> This is happening on a much larger scale with the production data where a >> similar search can result in a count discrepancy of many thousands. >> >> Could this be down to configuration, or is likely to be a bug? >> >> Thanks, >> >> Dec >> >> ---------------------------------------------------------------------------- >> Declan Newman, Development Team Lead, >> Semantico, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE >> <mailto:[email protected]> >> <tel:+44-1273-358247> >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general ---------------------------------------------------------------------------- Declan Newman, Development Team Lead, Semantico, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE <mailto:[email protected]> <tel:+44-1273-358247>
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
