Op Saturday 17 May 2008 00:04:31 schreef Chris Hostetter:
: Is it possible to compute a theoretical maximum score for a given
: query if constraints are placed on 'tf' and 'lengthNorm'? If so,
: scores could be compared to a 'perfect score' (a feature request
: from our customers)
I think a
That's right, this is fine. Many unit tests rely on it.
RAMDirectory is similar to UNIX in that deletion of an open file is
allowed, yet anything that had the file open can continue to read
from it. Delete on last close.
Also note that we don't write rename a segments.new file anymore
You're right. I want document clustering precisely the documents that are
already in the index. I don't know much about Mahout project, but it seems that
it doesn't help much. What I want is simply to group together similar documents
according to their similarity distance of the term vectors.
each call of next can only return one token ... if you wnat to return more
then one token based on each token you find in input then you need to
buffer them.
There's an abstract class in Solr that you can look at to see how this can
be done, and you can subclass it to get all the benefits...
: Please help me over how could i achieve the above process to search for a
: word in the file and display the results as discussed.
make each line in the file a seperate Lucene Document then the results
of each Query will be a line
-Hoss
As far as I know Lucene only handle single word synonyms at index
time. My life would be much simpler if it was possible to add synonyms
that spanned over multiple tokens, such as lucene in action=lia. I
have a couple of workarounds that are OK but it really isn't the same
thing when it
Op Saturday 17 May 2008 20:28:40 schreef Karl Wettin:
As far as I know Lucene only handle single word synonyms at index
time. My life would be much simpler if it was possible to add
synonyms that spanned over multiple tokens, such as lucene in
action=lia. I have a couple of workarounds that
Hi,
I have an application where I need to issue queries with a large number of
or-terms with individual boosts.
Currently I just construct a BooleanQuery with a large number (often 1000)
of constituent TermQueries. I'm wondering if there is a better way to do
this?
I'm open to implementing my
On May 17, 2008, at 1:15 PM, Supheakmungkol SARIN wrote:
You're right. I want document clustering precisely the documents
that are already in the index. I don't know much about Mahout
project, but it seems that it doesn't help much. What I want is
simply to group together similar