Failure recovery

2006-03-13 Thread Chuck Williams
Is there a way to determine whether or not an index that was left locked due to some improper system shutdown needs repair? My code does the following as part of starting up and creating an IndexWriter for an existing index that was created in a prior session: if

is there a way to find duplicate documents in the index?

2006-03-13 Thread emerson cargnin
I notice some duplicated entries in my index, my just looking at it, and I suspect there might be more than those I found out. Is there a way to detect duplicate documents in an index? Emerson Cargnin - To unsubscribe, e-mail:

RE: Search for synonyms - implemenetation for review

2006-03-13 Thread Ziv Gome
Hi Mark, thanks for your response. Here are my thoughts on your suggestion: I believe it would be a good idea to merge similar query expansion code. I also agree that the situation of fuzzy query is similar to the synonym query use-case, in the sense of having a root term and some related,

RE: Keeping RAMDirectory and filesystem index in sync

2006-03-13 Thread Satuluri, Venu_Madhav
Thanks, Jens. Seems like this would be pretty complicated. It seems the best way would be not have a separate daemon for indexing modifiied documents, but just have the reindexing part in the backend itself (it would know when any documents were modifiied), but since it would involve some

Re: Throughput doesn't increase when using more concurrent threads

2006-03-13 Thread Peter Keegan
Chris, Should this patch work against the current code base? I'm getting this error: D:\lucene-1.9patch -b -p0 -i nio-lucene-1.9.patch patching file src/java/org/apache/lucene/index/CompoundFileReader.java patching file src/java/org/apache/lucene/index/FieldsReader.java missing header for

Re: Throughput doesn't increase when using more concurrent threads

2006-03-13 Thread Peter Keegan
Chris, My apologies - this error was apparently caused by a file format mismatch (probably line endings). Thanks, Peter On 3/13/06, Peter Keegan [EMAIL PROTECTED] wrote: Chris, Should this patch work against the current code base? I'm getting this error: D:\lucene-1.9patch -b -p0 -i

Setting the COMMIT lock timeout.

2006-03-13 Thread Jim Bedford-roberts
I'm confused about how to set the COMMIT lock timeout since the version 1.9.1 release. I note that this can't be set from system properties anymore (CHANGES.txt, changes in run time behaviour 7), but am unable to find the replacement setter method promised for IndexWriter. Can anyone point

RE: 100,000 indexes and what to do

2006-03-13 Thread John Powers
How does the information change in each of these customer's documents? I would think if they were very dynamic then updates to the single index would not be great for you. But if the updates were just now and then, then given the performance of lucene that the single index would be just fine.

Re: is there a way to find duplicate documents in the index?

2006-03-13 Thread Yonik Seeley
On 3/13/06, emerson cargnin [EMAIL PROTECTED] wrote: I notice some duplicated entries in my index, my just looking at it, and I suspect there might be more than those I found out. Is there a way to detect duplicate documents in an index? Emerson Cargnin If there is a field with a unique

IndexSearcher and IndexWriter in conjuction

2006-03-13 Thread Nikhil Goel
Hi, Can someone please explain how does IndexSearcher and IndexWriter works in conjuction. As far as i know after reading all the posts in newgroup, it seems everything works fine if we have one IndexWriter thread and multiple IndexSearcher thread. But my doubt here is, looking at IndexSearcher

Re: Failure recovery

2006-03-13 Thread Yonik Seeley
On 3/13/06, Chuck Williams [EMAIL PROTECTED] wrote: Is there a way to determine whether or not an index that was left locked due to some improper system shutdown needs repair? Depends what you mean by repair. If there was a crash during index modification, I think the index should normally

Re: IndexSearcher and IndexWriter in conjuction

2006-03-13 Thread Patrick Kimber
Hi Nikhil We are using the index accessor contribution. For more information see: http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049 This should help you to co-ordinate the IndexSearcher and IndexWriter. Patrick On 13/03/06, Nikhil Goel [EMAIL PROTECTED] wrote:

Re: IndexSearcher and IndexWriter in conjuction

2006-03-13 Thread Nikhil Goel
Hi Patrick, thanks for writing back but my question is:- do we really need to write something new to achieve what I want to achieve. By going thru Lucene Tutorials, i dont think there is a need to do such a thing:- http://blog.danbartels.com/archive/2004/09/09/186.aspx Indexing and searching

Re: IndexSearcher and IndexWriter in conjuction

2006-03-13 Thread Yonik Seeley
On 3/13/06, Nikhil Goel [EMAIL PROTECTED] wrote: Can someone please explain how does IndexSearcher and IndexWriter works in conjuction. The trick is that once segment files are written, they are never modified (except for the segments file itself). New documents are added to new segments, not

Looking for Lucene consultant (UK based)

2006-03-13 Thread Robert Watkins
We, John Wiley Sons (http://www3.interscience.wiley.com/), are looking for a Lucene expert to assist with our migration from Verity to Lucene (up to six weeks work, starting this coming Monday, 20 March). The candidate must be based in the UK, preferably in or close to London, as we would like

Re: Setting the COMMIT lock timeout.

2006-03-13 Thread Daniel Naber
On Montag 13 März 2006 15:50, Jim Bedford-roberts wrote: I note that this can't be set from system properties anymore (CHANGES.txt, changes in run time behaviour 7), but am unable to find the replacement setter method promised for IndexWriter. Seems these have been forgotten. They can easily

Re: Setting the COMMIT lock timeout.

2006-03-13 Thread Bill Janssen
Daniel Naber ponders: Seems these have been forgotten. They can easily be added, but I still wonder what the use case is to set these values? The default value isn't magic. The appropriate value is context-specific. I've got some people using Lucene on machines with slow disks, and we need

Re: Keeping RAMDirectory and filesystem index in sync

2006-03-13 Thread Chris Hostetter
: The Searching process then would have to re-open it's RAMDirectory. the key to all of this being that there are constructors for RAMDirectory that make it very easy to load in the contents of an FSDirectory. : Or you check the version of the fs-based index from time to time, to see : when it

Re: IndexSearcher and IndexWriter in conjuction

2006-03-13 Thread Chris Hostetter
: The trick is that once segment files are written, they are never : modified (except for the segments file itself). New documents are : added to new segments, not existing segments. When segments are : merged, a new bigger segment is created. This way, the view of the : index for a specific

Sorting in Lucene

2006-03-13 Thread Bob Cheung
I am curious why the character / sorts before the space. For example, Apple/banana is good for you. Sorts before Apple banana is good for you Is there something I can do to make it sort correctly? Regards, Bob - To

Re: Sorting in Lucene

2006-03-13 Thread Yonik Seeley
On 3/13/06, Bob Cheung [EMAIL PROTECTED] wrote: I am curious why the character / sorts before the space. For example, Apple/banana is good for you. Sorts before Apple banana is good for you Are you sure that the field is untokenized, and that you are sorting in the correct direction?

question...

2006-03-13 Thread Aditya Liviandi
---BeginMessage--- Hi all, If I want to embed the index files into another file (say of extension *.luc, so now all the index files are flattened inside this new file), can I still use the index without having to extract out the index files to a temp folder? aditya ---End

Re: Can Lucene load more then 2GB into RAM memory?

2006-03-13 Thread Doug Cutting
RAMDirectory is indeed currently limited to 2GB. This would not be too hard to fix. Please file a bug report. Better yet, attach a patch. I assume you're running a 64bit JVM. If so, then MMapDirectory might also work well for you. Doug z shalev wrote: this is in continuation of a

RE: Sorting in Lucene

2006-03-13 Thread Bob Cheung
I'm pretty sure. The other characters sorted according to the ASCII sequence. It's only the slash sorted before the space. That's why I wonder whether slash is treated differently. Btw, this is the statement the sort field is added to the document. doc.add(Field.UnIndexed(_s +