Split the filename into "basefilename" and "version" and make each a keyword.
Sort your query by version descending, and only use the first "basefile" you encounter. On Wed, 17 Nov 2004 15:05:19 -0500, Luke Shannon <[EMAIL PROTECTED]> wrote: > Hey all; > > I have ran into an interesting case. > > Our system has notes. These need to be indexed. They are xml files called > default.xml and are easily parsed and indexed. No problem, have been doing it > all week. > > The problem is if someone edits the note, the system doesn't update the > default.xml. It creates a new file, default_1.xml (every edit creates a new > file with an incremented number, the sytem only displays the content from the > highest number). > > My problem is I index all the documents and end up with terms that were taken > out of note several version ago still showing up in the query. From my point > of view this makes sense because the files are still in the content. But to a > user it is confusing because they have no idea every change they make to a > note spans a new file and now the are seeing a term they removed from their > note 2 weeks ago showing up in a query. > > I have started modifying my incremental update to be look for multiple > version of the default.xml but it is more work than I thought and is going > make things complex. > > Maybe there is an easier way? If I just let it run and create the index, can > somebody suggest a way I could easily scan the index folder ensuring only the > default.xml with the highest number in its filename remains (only for folders > were there is more than one default.xml file)? Or is this wishful thinking? > > Thanks, > > Luke > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]