Hi,
This is a nice discussion :) > > > Yes, I see that. One additional problem that I need to solve for my > application is that I need to map from stemmed forms of the terms to at > least one un-stemmed form. Ideally it would be all un-stemmed forms, but > I can live with the first one. I realize that Lucene does not ealisy > support this because of the separation of church and state (I mean the > term filtering prior to indexing and querying), but I still need this > functionality... So, the question is, is this going to be common enough > to add a concept of a TermDictionary to Lucene and provide methods to > access it on the IndexReader and IndexWriter? If not, I could implement > this externally, but then I would not be able to use the IO framework > and whole concept of directories. Also, since the Term numbers are going > to be euphemeral just like doc numbers, externally I would have to refer > to them by text, slowing dow the translation process, etc., etc., etc.. I think that this is common enough to be added to Lucene. Have a mapping between the stems and unstemmed items is very valuable. It could be used as an alternative method for inflections. Maurits