Re: Is Fair Similarity working with lucene 2.2 ?

2008-01-22 Thread Fabrice Robini
Yes, I do not see an effect... Here is my unit test that test it: public void testFairSimilarity() throws CorruptIndexException, IOException, ParseException { Directory theDirectory = new RAMDirectory(); Analyzer theAnalyzer = new FrenchAnalyzer();

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-22 Thread Toke Eskildsen
On Mon, 2008-01-21 at 11:40 -0800, Michael Busch wrote: what kind of queries are you using for your tests? (num query terms, booleans clauses, phrases, wildcards?) No numbers (at least not parsed as numbers), no ranges, some wildcards, some phrases. The only non-trivial part of the queries is

Re: delete a document from indexwriter

2008-01-22 Thread Cam Bazz
I am looking at the IndexWriter source code - and I could not find a method (private) to delete by doc id. Where is it hiding? Best, -C.B. On Jan 19, 2008 1:07 PM, Michael McCandless [EMAIL PROTECTED] wrote: Good question So far, this method has not been carried over to IndexWriter

Re: delete a document from indexwriter

2008-01-22 Thread Michael McCandless
Exactly, that is the method... Mike Cam Bazz wrote: Hello, Did you mean the synchronized private void addDeleteDocID(int docId) { bufferedDeleteDocIDs.add(new Integer(docId)); numBytesUsed += OBJECT_HEADER_BYTES + BYTES_PER_INT + OBJECT_POINTER_BYTES; } this however does not

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-22 Thread Michael Busch
Thanks for your detailed answer, Toke! Is your default operator AND or OR? Toke Eskildsen wrote: On Mon, 2008-01-21 at 11:40 -0800, Michael Busch wrote: what kind of queries are you using for your tests? (num query terms, booleans clauses, phrases, wildcards?) No numbers (at least not

Re: delete a document from indexwriter

2008-01-22 Thread Michael McCandless
Well, docIDs are used all over the place in the index. Sometimes they key into an index file linearly, like for stored fields and term vectors index files, but other times they are encoded eg in the posting lists. Mike Cam Bazz wrote: Yes, I have found. however it is not for reqular

Re: delete a document from indexwriter

2008-01-22 Thread Michael McCandless
Cam Bazz wrote: Yes, I noticed http://www.archivum.info/[EMAIL PROTECTED]/2006-09/ msg00065.html Somehow I gotta do my delete within the same writer. I could use another field that combines both src and dst field, and use this field without storing but still a waste of resources. I

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-22 Thread Toke Eskildsen
On Tue, 2008-01-22 at 02:22 -0800, Michael Busch wrote: Is your default operator AND or OR? AND - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-22 Thread Michael Busch
OK, then Yonik might be right about the multi-level skiplists code which is new in 2.2. I'd love to see the performance numbers of the same index built with 2.3, if possible? You could simply migrate it to 2.3 by using IndexWriter.addIndexes(). In my performance tests (LUCENE-866) I measured an

svnversion not found...help!!!

2008-01-22 Thread abhinav pandey
hey i m encountering this error...while compiling lucene 2.2.0 Buildfile: build.xml javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created

Re: Is Fair Similarity working with lucene 2.2 ?

2008-01-22 Thread Srikant Jakilinki
Well, I cant seem to even get past the assertions of this code. The first assertion is failing in that I get 0 hits. I am using SimpleAnalyzer since I do not have a FrenchAnalyzer. Any thoughts? Srikant Fabrice Robini wrote: Yes, I do not see an effect... Here is my unit test that test it:

Re: Is Fair Similarity working with lucene 2.2 ?

2008-01-22 Thread Fabrice Robini
Oooops sorry, bad cut/paste... Here is the right one :-) public void testFairSimilarity() throws CorruptIndexException, IOException, ParseException { Directory theDirectory = new RAMDirectory(); Analyzer theAnalyzer = new StandardAnalyzer(); IndexWriter

HitCollector

2008-01-22 Thread Cam Bazz
Hello, Could someone show me a concrete example of how to use HitCollector? I have documents which have a field category. When I run a query, I need to sort results by category as well as count how many hits are there for a given category. I understand: searcher.search(Query, new HitCollector()

Re: HitCollector

2008-01-22 Thread Erick Erickson
The bitset thing is just an example of a trivial operation in a HitCollector. You'll want to do something like use TermDocs/TermEnum to see what category your document is in and add it to some counts you use rather than just add something to a bitset. Or see the idea at the end of this mail. That

Re: Is Fair Similarity working with lucene 2.2 ?

2008-01-22 Thread Srikant Jakilinki
OK, got it to work. Thanks. By a quick scoring comparision, I got the same scores for both hits. Maybe there is a loss of precision somewhere. Or when scores are equal, Lucene is doing something unintended/overlooked and thus putting shorter documents higher as the experiment is a special

Question about Lucene 2.3. file formats?

2008-01-22 Thread Ivan Vasilev
Hi Lucene Guys, As I see in the Lucene web site in file formats page the version 2.3 will have some changes in file formats that are very important for us. First I will say what we do and then will ask my questions. We distribute the index on some machines. The implementation is made so

Re: Is Fair Similarity working with lucene 2.2 ?

2008-01-22 Thread Fabrice Robini
Hi Srikant, I really thank you for your reply, it's very interesting. I have to say I am confused with that now... I do not know what I can to for passing this Unit test... I agree with you, it may be an issue of computing relevance. Fabrice Srikant Jakilinki-3 wrote: OK, got it to work.

unsuscribe

2008-01-22 Thread Gayo Diallo
unsuscribe java-user@lucene.apache.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question about Lucene 2.3. file formats?

2008-01-22 Thread Michael McCandless
Ivan Vasilev wrote: Hi Lucene Guys, As I see in the Lucene web site in file formats page the version 2.3 will have some changes in file formats that are very important for us. First I will say what we do and then will ask my questions. We distribute the index on some machines. The

Lucene help?

2008-01-22 Thread Itamar Syn-Hershko
Hi all, Yesterday I sent an email to this group querying about some very important (to me...) features of Lucene. I'm giving it another chance before it goes unnoticed or forgotten. If it was too long please let me know and I will email a shorter list of questions The original post can be

Re: Is Fair Similarity working with lucene 2.2 ?

2008-01-22 Thread Daniel Naber
On Dienstag, 22. Januar 2008, Fabrice Robini wrote: Oooops sorry, bad cut/paste... Here is the right one :-) The score is the same, so documents with a lower id (inserted earlier) will be returned first. So everything looks okay to me, or am I missing something? regards Daniel --

RE: Lucene, HTML and Hebrew

2008-01-22 Thread Steven A Rowe
Hi Itamar, In another thread, you wrote: Yesterday I sent an email to this group querying about some very important (to me...) features of Lucene. I'm giving it another chance before it goes unnoticed or forgotten. If it was too long please let me know and I will email a shorter list of

Re: Lucene, HTML and Hebrew

2008-01-22 Thread Grant Ingersoll
On Jan 22, 2008, at 6:06 PM, Steven A Rowe wrote: 2) How would I set the boosts for the headers and footnotes? I'd rather have it stored within the index file than have to append it to each and every query I will execute, but I'm open to suggestions. I'm more interested in performance and

DateTools UTC/GMT mismatch

2008-01-22 Thread Antony Bowesman
Hi, I just noticed that although the Javadocs for Lucene 2.2 state that the dates for DateTools use UTC as a timezone, they are actually using GMT. Should either the Javadocs be corrected or the code corrected to use UTC instead. Antony

RE: Lucene, HTML and Hebrew

2008-01-22 Thread Steven A Rowe
On 01/22/2008 at 8:49 PM, Grant Ingersoll wrote: On Jan 22, 2008, at 6:06 PM, Steven A Rowe wrote: On 01/21/2008 at 2:59 PM, Itamar Syn-Hershko wrote: 2) How would I set the boosts for the headers and footnotes? I'd rather have it stored within the index file than have to append it to

Re: Compass

2008-01-22 Thread Lukas Vlcek
Hi, I am using Compass with Spring and JPA. It works pretty nice. I don't store index into database, I use traditional file system based Lucene index. Updates work very well but you have to be careful about proper mapping of your objects into search engine (specially parent-child mappings).