Re: delete a document from indexwriter

2008-01-19 Thread Michael McCandless
Good question So far, this method has not been carried over to IndexWriter because in general it's not really safe, since there's no way to get an accurate docID from IndexWriter itself. You can't really know when IndexWriter does merges that compacts deletes and thus changes

Re: Optimize for large index size

2008-01-19 Thread Michael McCandless
vivek sar wrote: Thanks Michael for the feedback. Couple more questions, 1) Doesn't Lucene do some sort of optimization internally based on mergefactor, i.e, if the number of segments grow over the mergefactor number Lucene would automatically merge them into one segment - is this different

Creating an alias for a field name?

2008-01-19 Thread Jan Peter Stotz
Hi, I would like to provide multiple field-names that are all mapped to the same field in background (e.g. a long field-name and a short field-name). Is there any mechanism for creating such field-aliases, may be in the IndexWriter or an QueryParser? Jan

Re: Creating an alias for a field name?

2008-01-19 Thread Erick Erickson
Not that I know of. I presume that you want this to reduct typing or some such. Your app could simply massage the query that was typed, doing the appropriate substitutions before parsing the query Erick On Jan 19, 2008 6:52 AM, Jan Peter Stotz [EMAIL PROTECTED] wrote: Hi, I would like to

Re: Lucene Performance

2008-01-19 Thread Paul Elschot
On Friday 18 January 2008 17:52:27 Thibaut Britz wrote: Hi, ... Another thing I noticed is that we append a lot of queries, so we have a lot of duplicate phrases like (A and B or C) and ... and (A and B or C) (more nested than that). Is lucene doing any internal query optimization (like

Re: Creating an alias for a field name?

2008-01-19 Thread Jan Peter Stotz
Hi Erick, thanks for your response. Not that I know of. I presume that you want this to reduct typing or some such. Your app could simply massage the query that was typed, doing the appropriate substitutions before parsing the query Well I found a much better solution which avoids double

Using RangeFilter

2008-01-19 Thread vivek sar
Hi, I have a requirement to filter out documents by date range. I'm using RangeFilter (in combination to FilteredQuery) to do this. I was under the impression the filtering is done on documents, thus I'm just storing the date values, but not indexing them. As every new document would have a new

Re: Using RangeFilter

2008-01-19 Thread Otis Gospodnetic
Hi, Do you really need to store those dates? Why not just index them and not store them if index size is a concern? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Saturday,

Re: Creating an alias for a field name?

2008-01-19 Thread Otis Gospodnetic
A small comment. You mentioned Hashtable. As you are probably already creating a new QueryParser instance for every search, you most likely don't need a (synchronized) Hashtable and can use the unsynchronized HashMap. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -

Re: log4j error

2008-01-19 Thread Otis Gospodnetic
I can't think of a reason why this would happen. Sounds like a question for the Spring people. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: testn [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Friday, January 18, 2008 9:31:49 AM

Re: Using RangeFilter

2008-01-19 Thread vivek sar
I need to be able to sort on optime as well, thus need to store it . Isn't there any way to filter without indexing? Not sure why do I need to index some field I need to filter on. I thought we could get all the documents from an index and then filter out the documents from it for a field within a

Re: Using RangeFilter

2008-01-19 Thread Shai Erera
You can try to write your own HitCollector and on its collect(int doc, float score) method read the doc's date value and decide if it passes the filter or not. That is an approach I use for similar tasks. On Jan 20, 2008 6:51 AM, vivek sar [EMAIL PROTECTED] wrote: I need to be able to sort on

Re: index update problems with Linux

2008-01-19 Thread Otis Gospodnetic
Kevin, I don't see writer.close() in your code snippet. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kevin Dewi [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Friday, January 18, 2008 6:33:43 AM Subject: index update problems with

Re: Optimize for large index size

2008-01-19 Thread Otis Gospodnetic
In addition to what Mike already said: maxMergeDocs=9 -- do you really mean maxMergeDocs and not maxBufferedDocs? Larg(er) maxBufferedDocs will speed up indexing. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar [EMAIL

Re: Inverted search / Search on profilenet

2008-01-19 Thread Otis Gospodnetic
That's what the MemoryIndex in Lucene's contrib/ does. I tested it with a very fast incoming document stream (live blog posts from around the planet) and it held up well in my limited testing. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From:

Re: Open source Arabic stemmer

2008-01-19 Thread Otis Gospodnetic
The name is AraMorph. I believe that's the only free Arabic (morphological) analyzer and it is indeed GPL. I've used it on a few occasions and it seems to work well. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Grant Ingersoll [EMAIL

Re: IndexWriter#addIndexes

2008-01-19 Thread Otis Gospodnetic
Genau! Indices are simply merged on disk, their content is not re-analyzed. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Wednesday, January 16, 2008 7:48:27 AM

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-19 Thread Otis Gospodnetic
This is great and valuable information, Toke(n)! Just the other day we recommended this multi-IndexSearcher to somebody concerned with low QPS rates their benchmarks revealed. They were hitting their index with a good number of threads and hitting synchronized blocks in Lucene. Multiple