Don't know if this helps or hurts, but my approach for unit tests was to implement an index in a RAMdir for each test, index enough documents for my tests that I could strictly control and just do searches, man...
True, the weakness was that the data sets are very small, and this more of a "black box" test than one might want. I suppose one could argue that since I wasn't actually looking at a disk file, I wasn't testing a major portion of the system. But it sure caught a bunch of my programming errors <G>.... I had to do some fancy dancing to build an infrastructure that allowed my tests to index from a "semi-real" document characteristic of my problem space, which allowed me to have the indexing code in the loop as well... In essence, I forwent (what is the past tense of forgo anyway?) mock objects in favor of real objects and contrived a test that went fast enough that I didn't need to deal with mock objects and was self-contained. I had to make sure my search interface took a searcher object (a Dirctory would do), but that wasn't hard.. Warning: I'm not the most experienced unit tester in the world <G>, but it seems to me that the fewer mock objects in the system, the fewer disconnects there are between what you think you are testing and what you are really testing, so I favor real objects over mock objects when it's reasonably straight-forward. GIven that I know nothing about your problem space, though, this may be a totally useless approach <G>.... Best Erick