As always, a big thank you.

Adding the global lexicon collation that matched the default application 
lexicon collation resolved the problem on my local machine - I haven't been 
able to test this on production, but there's no reason to think that it 
wouldn't.

I will also advise that the recommended index settings are enabled (some, but 
not all, were already)

Thanks again,

Dec


On 10 Dec 2012, at 18:13, Michael Blakeley <[email protected]> wrote:

> That sounds like the index configuration interacting with filtered search. 
> The total is the unfiltered result of the index lookup, and includes some 
> false positives. The final results are filtered. You could check this by 
> adding 'unfiltered' to your search options.
> 
> http://docs.marklogic.com/guide/search-dev/wildcard discusses an extreme 
> situation, where there are no wildcard indexes at all:
> 
>> If character indexes, lexicons, and trailing wildcard indexes are all 
>> disabled in a database and wildcarding is explicitly enabled in the query 
>> (with the "wildcarded"option to the leaf-level cts:query constructor), the 
>> query will execute, but might require a lot of processing. Such queries will 
>> be fast if they are very selective and only need to do the wildcard searches 
>> over a relatively small amount of content, but can take a long time if they 
>> actually need to filter out results from a large amount of content.
> 
> 
> With any wildcard configuration, some queries will be easy to resolve and 
> some will be harder. The docs make a recommendation that works pretty well 
> for most cases:
> 
>> To enable any kind of wildcard query functionality with a good combination 
>> of performance and database size, MarkLogic recommends the following index 
>> settings:
>> 
>>      • word searches
>>      • three character word searches
>>      • word positions
>>      • word lexicon in the codepoint collation
>>      • three character word positions
>> This combination will provide accurate and fast wildcard queries for a wide 
>> variety of wildcard searches, including leading and trailing wildcarded 
>> searches. If you add the trailing wildcard searches index, you will get 
>> slightly more efficient trailing wildcard searches, but with increased 
>> database size.
> 
> 
> -- Mike
> 
> On 10 Dec 2012, at 09:15 , Declan Newman <[email protected]> wrote:
> 
>> We are seeing some odd behaviour with the counts coming back when using 
>> wildcard searches when using search:search.
>> 
>> I have recreated this on a local ML instance using the flowing:
>> 
>> /Document_1.xml - contains "Sons Ltd."
>> /Document_2.xml - contains "Sonss Ltd.
>> /Document_3.xml - contains "Sonas Ltd.
>> 
>> 
>> search:search("son?") => total="1" … along with the correct <search-result/> 
>> etc.
>> … which is obviously correct.
>> 
>> However:
>> search:search("son??") => total="3" … along with the two matching 
>> <search-result/> for "Sonss Ltd." and "Sonas Ltd."
>> 
>> It has correctly identified and returned the search results (and snippets) 
>> but the count is wrongly including the "Sons" as a match.
>> 
>> This is happening on a much larger scale with the production data where a 
>> similar search can result in a count discrepancy of many thousands.
>> 
>> Could this be down to configuration, or is likely to be a bug?
>> 
>> Thanks,
>> 
>> Dec
>> 
>> ----------------------------------------------------------------------------
>> Declan Newman, Development Team Lead,
>> Semantico, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
>> <mailto:[email protected]>
>> <tel:+44-1273-358247>
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

----------------------------------------------------------------------------
Declan Newman, Development Team Lead,
Semantico, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
<mailto:[email protected]>
<tel:+44-1273-358247>

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to