Re: getTimestamp method in IndexCommit

2008-09-03 Thread Michael McCandless
Noble Paul നോബിള്‍ नोब्ळ् wrote: On Tue, Sep 2, 2008 at 1:56 PM, Michael McCandless [EMAIL PROTECTED] wrote: Are you thinking this would just fallback to Directory.fileModified on the segments_N file for that commit? You could actually do that without any API change, because

concise definition of Lucene score?

2008-09-03 Thread Jon Loken
Hi all, I have attempted to find a concise definition of how the Lucene score is calculated, something that can be understood by most people. The information I found is accurate, but not particularly concise. http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apac

Re: Pre-filtering for expensive query

2008-09-03 Thread Matt Ronge
On Aug 30, 2008, at 3:01 PM, Paul Elschot wrote: Op Saturday 30 August 2008 18:19:09 schreef Matt Ronge: On Aug 30, 2008, at 4:43 AM, Karl Wettin wrote: Can you tell us a bit more about what you custom query does? Perhaps you can build the candidate filter and reuse it over and over again?

Re: Pre-filtering for expensive query

2008-09-03 Thread Grant Ingersoll
On Aug 30, 2008, at 3:14 PM, Andrzej Bialecki wrote: Matt Ronge wrote: Hi all, I am working on implementing a new Query, Weight and Scorer that is expensive to run. I'd like to limit the number of documents I run this query on by first building a candidate set of documents with a

Re: Lucene Memory Leak

2008-09-03 Thread Andy33
I took your advice and created Singletons for the Directory, Analyzer, and IndexSearcher classes. I also undid the closing of the Directory and IndexSearcher. This seemed to fix my memory leak problem. However, I don't like the fact that I am leaving open the IndexSearcher for the entire life of

Re: Pre-filtering for expensive query

2008-09-03 Thread Paul Elschot
Op Wednesday 03 September 2008 18:06:57 schreef Matt Ronge: On Aug 30, 2008, at 3:01 PM, Paul Elschot wrote: Op Saturday 30 August 2008 18:19:09 schreef Matt Ronge: On Aug 30, 2008, at 4:43 AM, Karl Wettin wrote: Can you tell us a bit more about what you custom query does? Perhaps you can

Re: Lucene Memory Leak

2008-09-03 Thread Simon Willnauer
If you are looking for a reasonable performance you should not close your IndexSearcher if not necessary. It is actually best practice to leave an IndexSearcher instance open an even share it between threads / requests of your webapplication. The searcher will not pollute your memory. Just keep

Re: concise definition of Lucene score?

2008-09-03 Thread Chris Hostetter
: I have attempted to find a concise definition of how the Lucene score is : calculated, something that can be understood by most people. The answer tends to vary based on exactly what type of query you are talking about ... TermQuery? PhraseQuery? BooleanQuery contianing a mix? I'm going

Re: search for empty field?

2008-09-03 Thread Erick Erickson
This has been discussed multiple times, so looking at the searchable archive will give you more detailed info. But as I remember, the consensus suggestion was to index some impossible value for those documents that lack a field. For instance, say your field was sometimes. I document that had

Re: search for empty field?

2008-09-03 Thread Erick Erickson
Oh.. I wonder if TermDocs/TermEnum would work for you instead. Would it work to just create a document validator at index time that threw an exception if all required fields weren't present? Or is that outside your control? Best Erick On Wed, Sep 3, 2008 at 3:11 PM, Chris Lu [EMAIL

Realtime Search for Social Networks Collaboration

2008-09-03 Thread Jason Rutherglen
Hello all, I don't mean this to sound like a solicitation. I've been working on realtime search and created some Lucene patches etc. I am wondering if there are social networks (or anyone else) out there who would be interested in collaborating with Apache on realtime search to get it to the

Re: search for empty field?

2008-09-03 Thread Chris Lu
I was kind of waiting for a more efficient solution based on TermDocs/TermEnum, but I feel since the term is not there at all, the only thing we can do is to do some deduction. I can copy the bitmap of all the deleted docs, and go through all the TermDocs/TermEnum, and set the bit if there is a

Similarity percentage between two Strings

2008-09-03 Thread Thiago Moreira
- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Similarity percentage between two Strings

2008-09-03 Thread N. Hira
I don't know how much of this is a Lucene problem, but -- as I'm sure you will inevitably hear from others on the list -- it depends on what your definition of similar is. By similar, do you mean: 1. Identical, except for variations in case (upper/lower) 2. Allow 1., but also allow

Re: Realtime Search for Social Networks Collaboration

2008-09-03 Thread Yonik Seeley
On Wed, Sep 3, 2008 at 3:20 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: I am wondering if there are social networks (or anyone else) out there who would be interested in collaborating with Apache on realtime search to get it to the point it can be used in production. Good timing Jason, I

Re: Pre-filtering for expensive query

2008-09-03 Thread Paul Elschot
Op Saturday 30 August 2008 18:22:50 schreef Matt Ronge: On Aug 30, 2008, at 6:13 AM, Paul Elschot wrote: Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge: Hi all, I am working on implementing a new Query, Weight and Scorer that is expensive to run. I'd like to limit the number of

Re: Pre-filtering for expensive query

2008-09-03 Thread Matt Ronge
On Sep 3, 2008, at 4:09 PM, Paul Elschot wrote: Op Saturday 30 August 2008 18:22:50 schreef Matt Ronge: On Aug 30, 2008, at 6:13 AM, Paul Elschot wrote: Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge: Hi all, I am working on implementing a new Query, Weight and Scorer that is

Re: Realtime Search for Social Networks Collaboration

2008-09-03 Thread Jason Rutherglen
Hi Yonik, The SOLR 2 list looks good. The question is, who is going to do the work? I tried to simplify the scope of Ocean as much as possible to make it possible (and slowly at that over time) for me to eventually finish what is mentioned on the wiki. I think SOLR is very cool and was major

Re: Similarity percentage between two Strings

2008-09-03 Thread Thiago Moreira
- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Similarity percentage between two Strings

2008-09-03 Thread N. Hira
More details may change my opinion (not quite sure how others feel yet), but with the way you've described it so far, it seems like all you need is a basic string matcher: For every message: - if blahmessage.subjectblah is found in the pool, then this message is similar to the message in

Re: Lucene Memory Leak

2008-09-03 Thread 장용석
In fact, I think that the important reasons are Directory class and Analyzer class. If you don't want IndexSearcher class keep open for the entire life of a web application, you can do it. I think It will not cause memory leak problem. But, Directory and Analyzer classes can cause the problem if

Re: search for empty field?

2008-09-03 Thread Chris Hostetter
I don't think category:* does what you think it does. category:[* TO *] will find all docs that have any indexed tokens in the category field, so combining that as a prohibited clause with a mandatory MatchAllDocsQuery will give you all docs that don't have anything indexed in the

Re: getTimestamp method in IndexCommit

2008-09-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Sep 3, 2008 at 2:06 PM, Michael McCandless [EMAIL PROTECTED] wrote: Noble Paul നോബിള്‍ नोब्ळ् wrote: On Tue, Sep 2, 2008 at 1:56 PM, Michael McCandless [EMAIL PROTECTED] wrote: Are you thinking this would just fallback to Directory.fileModified on the segments_N file for that