Hi Peter,

I guess you are right.

I've implemented this for a index with ten millions of really small
documents that all are stored in the index. The documents are never more than a thousand
words so re-indexing is quick enough. However it is probably not advisable to do
this with bigger documents or documents that need additional parsing.


/magnus


Peter Becker wrote:


Hi Magnus,

thanks for the offer, but unfortunately I can't/don't want to make the assumption that I can easily access the documents to re-index them. And I don't think this approach would be feasible unless you can keep the documents in memory somehow.

Storing the other/non-inverted/normal/whatever index would be expensive for indexing, but querying should be a lot faster than having to re-index documents. That is in our situation preferable.

Peter


Magnus Johansson wrote:


Hi Peter

If the original document is available. You could extract keywords from the document
at query time. That is when someone asks for documents similar to document a. You
re-analyze document a and in combination with statistics from the Lucene index you extract
keywords from document a that can then be used as a query for findining similar documents.


I've got some sample code if anyone is interested.

/magnus





--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to