I guess you are right.
I've implemented this for a index with ten millions of really small
documents that all are stored in the index. The documents are never more than a thousand
words so re-indexing is quick enough. However it is probably not advisable to do
this with bigger documents or documents that need additional parsing.
/magnus
Peter Becker wrote:
Hi Magnus,
thanks for the offer, but unfortunately I can't/don't want to make the assumption that I can easily access the documents to re-index them. And I don't think this approach would be feasible unless you can keep the documents in memory somehow.
Storing the other/non-inverted/normal/whatever index would be expensive for indexing, but querying should be a lot faster than having to re-index documents. That is in our situation preferable.
Peter
Magnus Johansson wrote:
Hi Peter
If the original document is available. You could extract keywords from the document
at query time. That is when someone asks for documents similar to document a. You
re-analyze document a and in combination with statistics from the Lucene index you extract
keywords from document a that can then be used as a query for findining similar documents.
I've got some sample code if anyone is interested.
/magnus
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
