If I use TrecContentSource to index a collection, it puts the doc name into the docname field, just as I like. say i have a doc with <DOCNO>DOCID0001</DOCNO> the problem is that concatenates the iteration number to this document name:
name = name + "_" + iteration; this produces a docname of DOCID0001_0, which won't work if I am trying to use the quality package to measure relevance. Does anyone object to changing TrecContentSource to *not do this* ??? I would think the primary reason you would want to use it would be to measure relevance. alternatively, we could change DocNameExtractor in the quality package to ignore this _Iteration suffix... doesn't matter to me. -- Robert Muir rcm...@gmail.com