Hi Doug,
thank you for the answer so far.
I actually wanted to add a large amount of text from an existing document to find a close related one. Can you suggest another good way of doing this. A direct match will not occur anyway. How can I make a most Vector Space Model (VSM) like query (each word a dimension value - find documents close to that)? You know as good as I that the standard VSM does not have any Boolean logic inside... how do I need to formuate the query to make it as much similar to a vector in order to find similar document in the vector space of the Lucene index?
You should try to reduce the dimensionality by reducing the number of unique features. In this case, you could for example use only keywords (or key phrases) instead of the full content of documents.
-- Best regards, Andrzej Bialecki
------------------------------------------------- Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator ------------------------------------------------- FreeBSD developer (http://www.freebsd.org)
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]