Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Lebiram
Hi Erick, I did a search just as JVM started... so I'm thinking that the JVM is busy with some startup stuff... and that this search required more memory than what is available at that time. Had I done this search a while after the JVM has started, then this query succeeds. I then pump in

Lock obtain timed out

2009-04-02 Thread Rehan Abdulaziz
Hey, Lucene is deployed at my Tomcat server, and when I send parallel calls from my client to add, delete or update documents, some operations are unsuccessful. The following exception is thrown: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:

SpellChecker AlreadyClosedException issue

2009-04-02 Thread John Cherouvim
Hello My code looks like this: Directory dir = null; try { dir = FSDirectory.getDirectory(/path/to/dictionary); SpellChecker spell = new SpellChecker(dir); // exception thrown here // ... dir.close(); } catch (IOException ex) { log error } finally { if (dir!=null) { try

Re: Lock obtain timed out

2009-04-02 Thread Ian Lea
Hi From the 2.4 javadocs for IndexWriter: setDefaultWriteLockTimeout(long writeLockTimeout) Sets the default (for any instance of IndexWriter) maximum time to wait for a write lock (in milliseconds). Lucene waits for the max specified time, retrying every 1000 millisecs by default, then

Re: What is the right query syntax for matching some field's substring?

2009-04-02 Thread Seid Mohammed
hi bonn, can you give me the link you did read for substring matching Thanks a lot On 4/2/09, Bon bon...@unipattern.com wrote: Hi Matt, Thanks for your answer, I'm new to lucene, so I don't know what should I know about that. I find a reference about discuss searching substring

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Erick Erickson
Ah, I get it now. Given that you bumped your max clause up, it makes sense. I'm pretty sure that the wildcard expansion is the root or your memory problems. The folks on the list helped me out a lot understanding what wildcards were about, see the thread titled I just don't get wildcards at all in

Using SpanNearQuery.getSpans() in a Search Result

2009-04-02 Thread David Seltzer
Hi all, I'm trying to figure out how to use SpanNearQuery.getSpans(IndexReader) when working with a result set from a query. Maybe I have a fundamental misunderstanding of what an IndexReader is - I'm under the impression that it's a mechanism for sequentially accessing the documents in

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Lebiram
Hi Erick The query was a test data basically in anticipation of searches on all indices (4 index) with 12 million docs that should yield very small results. Obviously that query does not happen in real life but it did break the system. If some user thought of just inputting random words then

Retrieving TokenStream from Tokenized Non-Stored Field

2009-04-02 Thread David Seltzer
Hi All, I have a document with a field called TextTranscript. Its created using the following command: myDoc.add(new Field(TextTranscript, sTranscriptBody, Field.Store.NO, Field.Index.TOKENIZED)); I'm then trying to retrieve the TokenStream by pulling the field. Field fTextTranscript =

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Erick Erickson
I didn't code it, so I'm speaking at least second hand It's a valid question whether having larger clauses is useful to the user. Having a 1024 term OR clause isn't narrowing that much. Plus, I think, it was a number that says, in effect, you should know that this is getting to be an

Re: Help to determine why an optimized index is proportionaly too big.

2009-04-02 Thread Michael McCandless
On Wed, Apr 1, 2009 at 5:20 PM, Dan OConnor docon...@acquiremedia.com wrote: All: We are using java lucene 2.3.2 to index a fairly large number of documents (roughly 400,000 per day). We have divided the time history into various depths. Our first stage covers 8 days and our next stage

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Mark Miller
You might try a constant score wildcard query (similar to a filter) - I think you'd have to grab it from solr's codebase until 2.9 comes out though. No clause limit, and reportedly *much* faster on large indexes. -- - Mark http://www.lucidimagination.com Lebiram wrote: Hi Erick The query

Speed of fuzzy searches

2009-04-02 Thread Matt Schraeder
I've got a simple Lucene index and search built for testing purposes. So far everything seems great. Most searches take 0.02 seconds or less. Searches with 4-5 terms take 0.25 seconds or less. However, once I began playing with fuzzy searches everything seemed to really slow down. A fuzzy

Re: Speed of fuzzy searches

2009-04-02 Thread Erick Erickson
This seems really odd, especially with an index that size. The first question is usually Do you open an IndexReader for each query? If you do, be aware that opening a reader/searcher is expensive, and the first few queries through the system are slow as the caches are built up. The second

Re: Speed of fuzzy searches

2009-04-02 Thread Mark Miller
Matt Schraeder wrote: I've got a simple Lucene index and search built for testing purposes. So far everything seems great. Most searches take 0.02 seconds or less. Searches with 4-5 terms take 0.25 seconds or less. However, once I began playing with fuzzy searches everything seemed to really

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Lebiram
I think I have looked at constant score queries however, the relevance is of value to the users so we left it as is. :( Erick's idea of stripping terms with wildcards that has less then an acceptable number of characters is a good idea and I might try it once I get the time. Thanks, M

Re: Speed of fuzzy searches

2009-04-02 Thread mark harwood
Try setting the minimum prefix length for fuzzy queries ( I think there is a setting on QueryParser or you may need to subclass) Prefix length of zero does edit distance comparisons for all unique terms e.g. from aardvark to Prefix length of one would cut this search space down to just

Re: Using SpanNearQuery.getSpans() in a Search Result

2009-04-02 Thread Paul Elschot
On Thursday 02 April 2009 15:36:44 David Seltzer wrote: Hi all, I'm trying to figure out how to use SpanNearQuery.getSpans(IndexReader) when working with a result set from a query. Maybe I have a fundamental misunderstanding of what an IndexReader is - I'm under the impression

LuSQL download link error?

2009-04-02 Thread Shashi Kant
Hi all, I have been trying to get the latest version of LuSQL from the NRC.ca website but get 404s on the download links. I have written to the webmaster, but anyone have the jar handy? Could I download from somewhere else? or could you email it to me? thanks, Shashi

Re: LuSQL download link error?

2009-04-02 Thread Glen Newton
Dear Shashi, It should work now. A temporary failure: our apologies. thanks, Glen 2009/4/2 Shashi Kant sk...@sloan.mit.edu: Hi all, I have been trying to get the latest version of LuSQL from the NRC.ca website but get 404s on the download links. I have written to the webmaster, but anyone

Re: IndexWriter.deleteDocuments(Query query)

2009-04-02 Thread John Wang
Hi Michael: Thanks for looking into this. Approach 2 has a dependency on how fast the delete set performs a check on a given id, approach one doesn't. After replacing my delete set with a simple bitset, approach 2 gets a 25-30% improvement. I understand if the delete set is small,

Re: Speed of fuzzy searches

2009-04-02 Thread Matt Schraeder
erickerick...@gmail.com 4/2/2009 10:24:42 AM This seems really odd, especially with an index that size. The first question is usually Do you open an IndexReader for each query? I'm using the Zend_Search_Lucene implementation so I'm really not sure how it handles the IndexReader. At the top

Re: IndexWriter.deleteDocuments(Query query)

2009-04-02 Thread Michael McCandless
On Thu, Apr 2, 2009 at 2:26 PM, John Wang john.w...@gmail.com wrote: Hi Michael:    Thanks for looking into this.    Approach 2 has a dependency on how fast the delete set performs a check on a given id, approach one doesn't. After replacing my delete set with a simple bitset, approach 2

Re: Retrieving TokenStream from Tokenized Non-Stored Field

2009-04-02 Thread Michael McCandless
Actually you have to mark the field as Field.Store.YES in order to see that field when you retrieve the doc at search time. You'll then be able to retrieve the string value. Mike On Thu, Apr 2, 2009 at 10:45 AM, David Seltzer dselt...@tveyes.com wrote: Hi All, I have a document with a field