Re: theoretical maximum score

2008-05-17 Thread Paul Elschot
Op Saturday 17 May 2008 00:04:31 schreef Chris Hostetter: : Is it possible to compute a theoretical maximum score for a given : query if constraints are placed on 'tf' and 'lengthNorm'? If so, : scores could be compared to a 'perfect score' (a feature request : from our customers) I think a

Re: simultaneous read and writes to the RAMDirectory

2008-05-17 Thread Michael McCandless
That's right, this is fine. Many unit tests rely on it. RAMDirectory is similar to UNIX in that deletion of an open file is allowed, yet anything that had the file open can continue to read from it. Delete on last close. Also note that we don't write rename a segments.new file anymore

Re: Document clustering with Lucene

2008-05-17 Thread Supheakmungkol SARIN
You're right. I want document clustering precisely the documents that are already in the index. I don't know much about Mahout project, but it seems that it doesn't help much. What I want is simply to group together similar documents according to their similarity distance of the term vectors.

Re: Using more tokens in TokenFilter :(

2008-05-17 Thread Chris Hostetter
each call of next can only return one token ... if you wnat to return more then one token based on each token you find in input then you need to buffer them. There's an abstract class in Solr that you can look at to see how this can be done, and you can subclass it to get all the benefits...

Re: Search and retrieve the line data from the File

2008-05-17 Thread Chris Hostetter
: Please help me over how could i achieve the above process to search for a : word in the file and display the results as discussed. make each line in the file a seperate Lucene Document then the results of each Query will be a line -Hoss

multi word synonyms

2008-05-17 Thread Karl Wettin
As far as I know Lucene only handle single word synonyms at index time. My life would be much simpler if it was possible to add synonyms that spanned over multiple tokens, such as lucene in action=lia. I have a couple of workarounds that are OK but it really isn't the same thing when it

Re: multi word synonyms

2008-05-17 Thread Paul Elschot
Op Saturday 17 May 2008 20:28:40 schreef Karl Wettin: As far as I know Lucene only handle single word synonyms at index time. My life would be much simpler if it was possible to add synonyms that spanned over multiple tokens, such as lucene in action=lia. I have a couple of workarounds that

MultiTerm Or Query with per-term boost. Does it exist?

2008-05-17 Thread John Jensen
Hi, I have an application where I need to issue queries with a large number of or-terms with individual boosts. Currently I just construct a BooleanQuery with a large number (often 1000) of constituent TermQueries. I'm wondering if there is a better way to do this? I'm open to implementing my

Re: Document clustering with Lucene

2008-05-17 Thread Grant Ingersoll
On May 17, 2008, at 1:15 PM, Supheakmungkol SARIN wrote: You're right. I want document clustering precisely the documents that are already in the index. I don't know much about Mahout project, but it seems that it doesn't help much. What I want is simply to group together similar