thanks for the offer, but unfortunately I can't/don't want to make the assumption that I can easily access the documents to re-index them. And I don't think this approach would be feasible unless you can keep the documents in memory somehow.
Storing the other/non-inverted/normal/whatever index would be expensive for indexing, but querying should be a lot faster than having to re-index documents. That is in our situation preferable.
Peter
Magnus Johansson wrote:
Hi Peter
If the original document is available. You could extract keywords from the document
at query time. That is when someone asks for documents similar to document a. You
re-analyze document a and in combination with statistics from the Lucene index you extract
keywords from document a that can then be used as a query for findining similar documents.
I've got some sample code if anyone is interested.
/magnus
Peter Becker wrote:
Hi Terry,
we have been thinking about the same problem and in the end we decided that most likely the only good solution to this is to keep a non-inverted index, i.e. a map from the documents to the terms. Then you can query the most terms for the documents and query other documents matching parts of this (where you get the usual question of what is actually interesting: high frequency, low frequency or the mid range).
Indexing would probably be quite expensive since Lucene doesn't seem to support changes in the index, and the index for the terms would change all the time. We haven't implemented it yet, but it shouldn't be hard to code. I just wouldn't expect good performance when indexing large collections.
Peter
Terry Steichen wrote:
Is it possible without extensive additional coding to use Lucene to conduct a search based on a document rather than a query? (One use of this would be to refine a search by selecting one of the hits returned from the initial query and subsequently retrieving other documents "like" the selected one.)
Regards,
Terry
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
