I'm trying to figure out what could possibly account for buggy results for wildcard searches in certain fringe cases (running MarkLogic 7.0-4.3).
I have two servers running on the same data set of 166K documents, with identical host, database and app server settings so far as I can determine (for anything related to word query at least). Ordinarily, wildcard searches on words return the exact same number of matches on both hosts. For example: H1 H2 democra* 1579 1579 demo* 4354 4354 dem* 16866 16866 But there are certain word stems that produce buggy results on H2, matching all documents when they shouldn't. Actually I should say "word stem" because the buggy results all involve words starting in "rel". For example: H1 H2 religions* 138 138 religion* 2448 166618 relig* 3810 166618 reli* 14608 166618 rel* 39888 39888 re* 150890 166618 relia* 1084 166618 relie* 8306 166618 relo* 156 166618 relm* 3 3 I have tried unsuccesfully to find other letter sequences that exhibit the bug in a wildcard search or that give different result counts for H2. So far it's only certain "rel-" examples. My next step will be a forced reindex of the database on H2 to see if that helps, but before I do that I wonder if anyone has a clue what might account for this behavior. Even odder, on two entirely different systems running an entirely different MarkLogic software instance, "rel-" searches are also showing discrepancies, though I haven't researched that one as thoroughly. Some deep-level indexing bug, possibly? David -- David Sewell, Editorial and Technical Manager ROTUNDA, The University of Virginia Press PO Box 400314, Charlottesville, VA 22904-4314 USA Email: dsew...@virginia.edu Tel: +1 434 924 9973 Web: http://rotunda.upress.virginia.edu/ _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general