That sounds like the index configuration interacting with filtered search. The total is the unfiltered result of the index lookup, and includes some false positives. The final results are filtered. You could check this by adding 'unfiltered' to your search options.
http://docs.marklogic.com/guide/search-dev/wildcard discusses an extreme situation, where there are no wildcard indexes at all: > If character indexes, lexicons, and trailing wildcard indexes are all > disabled in a database and wildcarding is explicitly enabled in the query > (with the "wildcarded"option to the leaf-level cts:query constructor), the > query will execute, but might require a lot of processing. Such queries will > be fast if they are very selective and only need to do the wildcard searches > over a relatively small amount of content, but can take a long time if they > actually need to filter out results from a large amount of content. With any wildcard configuration, some queries will be easy to resolve and some will be harder. The docs make a recommendation that works pretty well for most cases: > To enable any kind of wildcard query functionality with a good combination of > performance and database size, MarkLogic recommends the following index > settings: > > • word searches > • three character word searches > • word positions > • word lexicon in the codepoint collation > • three character word positions > This combination will provide accurate and fast wildcard queries for a wide > variety of wildcard searches, including leading and trailing wildcarded > searches. If you add the trailing wildcard searches index, you will get > slightly more efficient trailing wildcard searches, but with increased > database size. -- Mike On 10 Dec 2012, at 09:15 , Declan Newman <[email protected]> wrote: > We are seeing some odd behaviour with the counts coming back when using > wildcard searches when using search:search. > > I have recreated this on a local ML instance using the flowing: > > /Document_1.xml - contains "Sons Ltd." > /Document_2.xml - contains "Sonss Ltd. > /Document_3.xml - contains "Sonas Ltd. > > > search:search("son?") => total="1" … along with the correct <search-result/> > etc. > … which is obviously correct. > > However: > search:search("son??") => total="3" … along with the two matching > <search-result/> for "Sonss Ltd." and "Sonas Ltd." > > It has correctly identified and returned the search results (and snippets) > but the count is wrongly including the "Sons" as a match. > > This is happening on a much larger scale with the production data where a > similar search can result in a count discrepancy of many thousands. > > Could this be down to configuration, or is likely to be a bug? > > Thanks, > > Dec > > ---------------------------------------------------------------------------- > Declan Newman, Development Team Lead, > Semantico, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE > <mailto:[email protected]> > <tel:+44-1273-358247> > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
