I'm trying to figure out what could possibly account for buggy results for 
wildcard searches in certain fringe cases (running MarkLogic 7.0-4.3).

I have two servers running on the same data set of 166K documents, with 
identical host, database and app server settings so far as I can determine (for 
anything related to word query at least). Ordinarily, wildcard searches on 
words 
return the exact same number of matches on both hosts. For example:

                H1      H2
democra*         1579    1579
demo*            4354    4354
dem*            16866   16866

But there are certain word stems that produce buggy results on H2, matching all 
documents when they shouldn't. Actually I should say "word stem" because the 
buggy results all involve words starting in "rel". For example:

                H1      H2
religions*         138     138
religion*         2448  166618
relig*            3810  166618
reli*            14608  166618
rel*             39888   39888
re*             150890  166618
relia*            1084  166618
relie*            8306  166618
relo*              156  166618
relm*                3       3

I have tried unsuccesfully to find other letter sequences that exhibit the bug 
in a wildcard search or that give different result counts for H2. So far it's 
only certain "rel-" examples.

My next step will be a forced reindex of the database on H2 to see if that 
helps, but before I do that I wonder if anyone has a clue what might account 
for 
this behavior.

Even odder, on two entirely different systems running an entirely different 
MarkLogic software instance, "rel-" searches are also showing discrepancies, 
though I haven't researched that one as thoroughly. Some deep-level indexing 
bug, possibly?

David

-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: dsew...@virginia.edu   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to