Lucene search question

2007-11-13 Thread Cláudio Fernandes
Hello all, I don't know if this is a somehow naive question, but here we go: Does Lucene support index by sections? Like having a text document with three sections divided by XML tags indexed in a way we could do a search by work and section. Does Lucene itself support this kind of indexing or

RE: Lucene search question

2007-11-13 Thread Chhabra, Kapil
If its only about the search, you could have section as just another field in your index. You could simply search on work as well as section. Otherwise, if you are looking at aggregating category hits, then look at http://mail-archives.apache.org/mod_mbox/lucene-java-user/200605.mbox/[EMAIL

Re: Lucene search question

2007-11-13 Thread Grant Ingersoll
Yes, your application can do this using Lucene. Lucene is a low level search enabling library, it is up to your application to give meaning to what you put in it. One way doing what you want is to give each section its own Field for any given document. Cheers, Grant On Nov 13, 2007, at

OutOfMemoryError on small search in large, simple index

2007-11-13 Thread Lars Clausen
We've run into a blocking problem with our use of Lucene: we get OutOfMemoryError when performing a one-term search in our index. The search, if completed, should give only a few thousand hits, but from inspecting a heap dump it appears that many more documents in the index get stored in Lucene

Re: OutOfMemoryError on small search in large, simple index

2007-11-13 Thread Daniel Naber
On Dienstag, 13. November 2007, Lars Clausen wrote: Can it be right that memory usage depends on size of the index rather than size of the result? Yes, see IndexWriter.setTermIndexInterval(). How much RAM are you giving to the JVM now? Regards Daniel -- http://www.danielnaber.de

Re: substring indexing to avoid 'TooManyClauses' exception

2007-11-13 Thread Erick Erickson
Hardy: I'm certainly not an expert on ranking and scoring, but I've got to assume that this approach influences scoring. Another issue is how you indexed multiple values. If you took a hint from the SynonymAnalyzer example in Lucene In Action, and indexed all the substrings with an increment of

Re: Lucene search question

2007-11-13 Thread Cláudio Fernandes
Hi, On Tue, 2007-11-13 at 07:32 -0500, Grant Ingersoll wrote: Yes, your application can do this using Lucene. Lucene is a low level search enabling library, it is up to your application to give meaning to what you put in it. One way doing what you want is to give each section its own

Re: Lucene search question

2007-11-13 Thread Erick Erickson
If you only have a maximum of a few sections, then indexing as different fields should work fine. If you have a big upper limit you might need to do something like index all the data in one field with a special marker (e.g. $$$) between sections, then use termdocs/termenum on the result set to

Re: OutOfMemoryError on small search in large, simple index

2007-11-13 Thread Chris Hostetter
: Can it be right that memory usage depends on size of the index rather : than size of the result? : : Yes, see IndexWriter.setTermIndexInterval(). How much RAM are you giving to : the JVM now? and in general: yes. Lucene is using memory so that *lots* of searches can be fast ... if you

Re: Lucene search question

2007-11-13 Thread Steven D. Majewski
On Nov 13, 2007, at 7:21 AM, Cláudio Fernandes wrote: Hello all, I don't know if this is a somehow naive question, but here we go: Does Lucene support index by sections? Like having a text document with three sections divided by XML tags indexed in a way we could do a search by work and

Re: Lucene search question

2007-11-13 Thread Grant Ingersoll
On Nov 13, 2007, at 11:59 AM, Steven D. Majewski wrote: Lucene is great at finding documents, but not quite as good at finding things IN documents. The index contains pointers to the terms, but they are pointers to a token in the parsed token stream, so to find a character index into a

Re: restoring a corrupt index?

2007-11-13 Thread vivek sar
We have seen similar exceptions (with Lucene 2.2) when were doing the following mistakes, 1) Not closing the old searchers and re-creating a new one for every new search (fixed it by closing the searcher every time, if you want you could only one searcher instance as well) 2) Not having any jvm

Re: restoring a corrupt index?

2007-11-13 Thread Michael McCandless
vivek sar [EMAIL PROTECTED] wrote: I think if the indexer is abruptly stopped while it's in progress the index corruption can happen. One correction here: as far as I know, the index should not become corrupt if the JVM is kill -9'd or JVM crashes. If that seems to be happening then we need

How's 2.3 doing?

2007-11-13 Thread testn
Hi, Are we closed to release Lucene 2.3? Is it stable enough to production? I thought it's supposed to be released in October. Thanks, -- View this message in context: http://www.nabble.com/How%27s-2.3-doing--tf4802426.html#a13740560 Sent from the Lucene - Java Users mailing list archive at

Re: How's 2.3 doing?

2007-11-13 Thread Michael Busch
testn wrote: Hi, Are we closed to release Lucene 2.3? Is it stable enough to production? I thought it's supposed to be released in October. Thanks, I think it's very close. There are a couple of outstanding issues: