IndexMergeTool - Close indexes before merge?

2007-10-10 Thread Patrick Kimber
Hi

The IndexMergeTool (see url below) creates a new index, the mergedIndex.

Do the other indexes, index1, index2, etc, need to be closed
before performing the merge?
This is the same as asking if the indexes passed to
IndexWriter.addIndexes need to be closed before they are added to the
new index.

http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/misc/IndexMergeTool.java

Thanks for your help,

Patrick

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2, NFS, Lock obtain timed out

2007-09-07 Thread Patrick Kimber

 pkimber [EMAIL PROTECTED] wrote:

  We are still getting various issues on our Lucene indexes running on
  an NFS share.  It has taken me some time to find some useful
  information to report to the mailing list.

 Bummer!

 Can you zip up your test application that shows the issue, as well as
 the full logs from both servers?  I can look at them  try to
 reproduce the error.

 Mike


Yeh, I know!

I cannot send you the source code without speaking to my manager
first.  I guess he would want me to change the code before sending it
to you.  You could have the log files now, but I expect you want to
wait until the test application is ready to send?

Thanks for your help,

Patrick

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 答复: 答复: About muti-Threads in Lucene

2007-08-07 Thread Patrick Kimber
Hi Kai

No, I have no problem returning hits.

When I do have problems like this, I usually find I have something
more to learn about Lucene indexing.  Try looking at the data and
query in Luke.  I usually find this is the best way to understand what
is going on.

Here is the link to Luke:
http://www.getopt.org/luke/

I hope you get it sorted.

Patrick

On 07/08/07, Kai Hu [EMAIL PROTECTED] wrote:
 By the way, Patrick,did you have a problem that IndexSearcher.search(Query 
 query) cann't get the all matched hits.it only return a part of matched hits.
 my test code is:
 String key = title:good;
 Directory directory = FSDirectory.getDirectory(d:\\index\\);
 IndexSearcher searcher = new IndexSearcher(directory);
 QueryParser queryParser = new QueryParser(,analyzer);
 Query query = queryParser.parse(key);
 hits = searcher.search(query,sort);
 there are two documents indexed which title value is good,but when I 
 searched by keytitle:good,it returned only one documents.is it a bug?

 kai

 Hi Kai
 We keep a synchronized map of LuceneIndexAccessor instances, one instance per
 The map is keyed on the directory path.  We then re-use
 the accessor rather than creating a new one each time.
 
 Patrick
 On 06/08/07, Kai Hu [EMAIL PROTECTED] wrote:
  Thanks , Patrick,
 
  It is useful. But I found a problem that I use
 
  new LuceneIndexAccessor(accessProvider); when a request comes in B/S,the 
  LuceneIndexAccessor.getWriter() will lose its sense,it will new an 
  IndexWriter.
 
  public IndexWriter getWriter() throws IOException {
 
 IndexWriter result;
 
 synchronized (this) {//here synchronized will lose its sense
 
 checkClosed();
 
 ...
 
 if (cachedWriter != null) {
 
log.debug(returning cached writer);
 
result = cachedWriter;
 
writerUseCount++;
 
 } else {
 
log.debug(opening new writer and caching it);
 
result = accessProvider.getWriter();// when new a 
  LuceneIndexAccessor Object ,it will new an IndexWriter Object
 
cachedWriter = result;
 
writerUseCount = 1;
 
 }
 
  }
 
  }
 
  It will also throw a Exception cann't obtain the Lock,should I use a 
  single instance of LuceneIndexAccessor? Suppose I use a single instance of 
  LuceneIndexAccessor,how to set a different Directory or Analyzer at one 
  time.
 
 
 
  kai
 
 
  ///
 
  ///
 
 
 
  Hi Kai
 
  
 
  We use the Lucene Index Accessor contribution:
 
  
 
  http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049
 
  
 
  Patrick
 
 
 
  On 06/08/07, Kai Hu [EMAIL PROTECTED] wrote:
 
   Hi,
 
  
 
How do you solve the problems when add,update,delete documents
 
   in muti-threads,use synchronized ?
 
  
 
  
 
 
 
  -
 
  To unsubscribe, e-mail: [EMAIL PROTECTED]
 
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: About muti-Threads in Lucene

2007-08-06 Thread Patrick Kimber
Hi Kai

We use the Lucene Index Accessor contribution:

http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049

Patrick

On 06/08/07, Kai Hu [EMAIL PROTECTED] wrote:
 Hi,

  How do you solve the problems when add,update,delete documents
 in muti-threads,use synchronized ?



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 答复: About muti-Threads in Lucene

2007-08-06 Thread Patrick Kimber
Hi Kai

We keep a synchronized map of LuceneIndexAccessor instances, one instance per
Directory.  The map is keyed on the directory path.  We then re-use
the accessor rather than creating a new one each time.

Patrick

On 06/08/07, Kai Hu [EMAIL PROTECTED] wrote:
 Thanks , Patrick,

 It is useful. But I found a problem that I use

 new LuceneIndexAccessor(accessProvider); when a request comes in B/S,the 
 LuceneIndexAccessor.getWriter() will lose its sense,it will new an 
 IndexWriter.

 public IndexWriter getWriter() throws IOException {

IndexWriter result;

synchronized (this) {//here synchronized will lose its sense

checkClosed();

...

if (cachedWriter != null) {

   log.debug(returning cached writer);

   result = cachedWriter;

   writerUseCount++;

} else {

   log.debug(opening new writer and caching it);

   result = accessProvider.getWriter();// when new a 
 LuceneIndexAccessor Object ,it will new an IndexWriter Object

   cachedWriter = result;

   writerUseCount = 1;

}

 }

 }

 It will also throw a Exception cann't obtain the Lock,should I use a single 
 instance of LuceneIndexAccessor? Suppose I use a single instance of 
 LuceneIndexAccessor,how to set a different Directory or Analyzer at one time.



 kai


 ///

 ///



 Hi Kai

 

 We use the Lucene Index Accessor contribution:

 

 http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049

 

 Patrick



 On 06/08/07, Kai Hu [EMAIL PROTECTED] wrote:

  Hi,

 

   How do you solve the problems when add,update,delete documents

  in muti-threads,use synchronized ?

 

 



 -

 To unsubscribe, e-mail: [EMAIL PROTECTED]

 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: What replaced org.apache.lucene.document.Field.Text?

2007-07-25 Thread Patrick Kimber

Hi Andy

I think:
Field.Text(name, value);

has been replaced with:
new Field(name, value, Field.Store.YES, Field.Index.TOKENIZED);

Patrick

On 25/07/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

Please reference How do I get code written for Lucene 1.4.x to work with
Lucene 2.x?
http://wiki.apache.org/lucene-java/LuceneFAQ#head-86d479476c63a2579e867b
75d4faa9664ef6cf4d


Andy
-Original Message-
From: Lindsey Hess [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 25, 2007 12:31 PM
To: Lucene
Subject: What replaced org.apache.lucene.document.Field.Text?

I'm trying to get some relatively old Lucene code to compile (please see
below), and it appears that Field.Text has been deprecated.  Can someone
please suggest what I should use in its place?

  Thank you.

  Lindsey



  public static void main(String args[]) throws Exception
  {
  String indexDir =
  System.getProperty(java.io.tmpdir, tmp) +
  System.getProperty(file.separator) + address-book;
  Analyzer analyzer = new WhitespaceAnalyzer();
  boolean createFlag = true;

  IndexWriter writer = new IndexWriter(indexDir, analyzer, createFlag);
  Document contactDocument = new Document();
  contactDocument.add(Field.Text(type, individual));

  contactDocument.add(Field.Text(name, Zane Pasolini));
  contactDocument.add(Field.Text(address, 999 W. Prince St.));
  contactDocument.add(Field.Text(city, New York));
  contactDocument.add(Field.Text(province, NY));
  contactDocument.add(Field.Text(postalcode, 10013));
  contactDocument.add(Field.Text(country, USA));
  contactDocument.add(Field.Text(telephone, 1-212-345-6789));
  writer.addDocument(contactDocument);
  writer.close();
  }


-
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user
panel and lay it on us.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-05 Thread Patrick Kimber

Hi Michael

Just to let you know, I am on holiday for one week so will not be able
to send a progress report until I return.

I have deployed the new code to a test site so I will be informed if
the users notice any issues.

Thanks for your help

Patrick


On 04/07/07, Michael McCandless [EMAIL PROTECTED] wrote:


Patrick Kimber [EMAIL PROTECTED] wrote:

 Yes, there are many lines in the logs saying:
 hit FileNotFoundException when loading commit segment_X; skipping
 this commit point
 ...so it looks like the new code is working perfectly.

Super!

 I am sorry to be vague... but how do I check which segments file is
 opened when a new writer is created?

Oh, sorry, it's not exactly obvious.  Here's what to look for:

On machine #1 (the machine that added docs  then closed its writer)
you should see lines like this, which are printed every time the
writer flushes its docs:

checkpoint: wrote segments file segments_X

Find the last such line on machine #1 before it closes the writer, and
that's the current segments_X in the index.

Then on machine #2 (the machine that immediately opens a new writer
after machine #1 closed its writer) you should see a line like this:

[EMAIL PROTECTED] main: init: current segments file is segments_Y

which indicates which segments file was loaded by this writer.  The
thing to verify is that X is always equal to Y whenever a writer
quickly moves from machine #1 to machine #2.

 I will add a check to my test to see if all documents are added.  This
 should tell us if any documents are being silently lost.

Very good!  Keep us posted, and good luck,

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-04 Thread Patrick Kimber

Hi Michael

Yes, there are many lines in the logs saying:
hit FileNotFoundException when loading commit segment_X; skipping
this commit point
...so it looks like the new code is working perfectly.

I am sorry to be vague... but how do I check which segments file is
opened when a new writer is created?

I will add a check to my test to see if all documents are added.  This
should tell us if any documents are being silently lost.

Thanks

Patrick

On 03/07/07, Michael McCandless [EMAIL PROTECTED] wrote:

Patrick Kimber [EMAIL PROTECTED] wrote:

 I have been running the test for over an hour without any problem.
 The index writer log file is getting rather large so I cannot leave
 the test running overnight.  I will run the test again tomorrow
 morning and let you know how it goes.

Ahhh, that's good news, I'm glad to hear that!

You should go ahead and turn off the logging and make sure things are
still fine (just in case logging is changing timing of events since
timing is a factor here).

In your logs, do you see lines like this?:

  ... hit FileNotFoundException when loading commit segment_X; skipping this 
commit point

That would confirm the new code (to catch the FileNotFoundException)
is indeed being hit.

Actually, could you also check the logs and try to verify that each
time one machine closed its writer and a 2nd machine opened a new
writer that the 2nd machine indeed loaded the newest segments_N file
and not segments_N-1?  (This is the possible new issue I was referring
to).  I fear that this new issue could silently lose documents added
by another machine and possibly not throw an exception.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Patrick Kimber

Hi

I have added more logging to my test application.  I have two servers
writing to a shared Lucene index on an NFS partition...

Here is the logging from one server...

[10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
[10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete [segments_n]

and the other server (at the same time):

[10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
[10:49:18] [DEBUG] IndexAccessProvider getWriter()
[10:49:18] [ERROR] DocumentCollection update(DocumentData)
com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
document to the index.
[/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
such file or directory)]
   at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)

I think the exception is being thrown when the IndexWriter is created:
new IndexWriter(directory, false, analyzer, false, deletionPolicy);

I am confused... segments_n should not have been touched for 3 minutes
so why would a new IndexWriter want to read it?

Here is the whole of the stack trace:

com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
document to the index.
[/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
such file or directory)]
at 
com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)
at com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:364)
at com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:342)
at com.company.lucene.IndexUpdate.update(IndexUpdate.java:67)
at 
com.company.lucene.icm.DocumentCollection.update(DocumentCollection.java:390)
at lucene.icm.test.Write.add(Write.java:105)
at lucene.icm.test.Write.run(Write.java:79)
at lucene.icm.test.Write.main(Write.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:271)
at java.lang.Thread.run(Thread.java:534)
Caused by: java.io.FileNotFoundException:
/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:204)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:506)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:536)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:531)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
at 
org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:156)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:573)
at 
com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
at 
com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
at 
com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
... 13 more

Thank you very much for your previous comments and emails.

Any help solving this issue would be appreciated.

Patrick


On 30/06/07, Michael McCandless [EMAIL PROTECTED] wrote:


Patrick Kimber wrote:

 I have been checking the application log.  Just before the time when
 the lock file errors occur I found this log entry:
 [11:28:59] [ERROR] IndexAccessProvider
 java.io.FileNotFoundException:
 /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No
 such file or directory)
 at java.io.RandomAccessFile.open(Native Method)

I think this exception is the root cause.  On hitting this IOException
in reader.close(), that means this reader has not released its write
lock.  Is it possible to see the full stack trace?

Having the wrong deletion policy or even a buggy deletion policy (if
indeed file.lastModified() varies by too much across machines) can't
cause this (I think).  At worse, the wrong deletion policy should
cause other already-open readers to hit Stale NFS handle
IOExceptions during searching.  So, you should use your
ExpirationTimeDeletionPolicy when opening your readers if they will be
doing deletes, but I don't think it explains this root-cause exception
during close().

It's a rather spooky exception ... in close(), the reader initializes
an IndexFileDeleter which lists the directory and opens any segments_N
files that it finds.

Do you have a writer on one machine closing, and then very soon
thereafter this reader

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Patrick Kimber

Hi

I am using the NativeFSLockFactory.  I was hoping this would have
stopped these errors.

Patrick

On 03/07/07, Neeraj Gupta [EMAIL PROTECTED] wrote:

Hi

this is the case where index create by one server is updated by other
server, results into index corruption. This exception occuring while
creating instance of Index writer because at the time of index writer
instance creation it checks if index exists or not, if you are not
creating a new Index. And it keeps the information with it, but when you
go to add some document, now the indexes has been modified by other
server. Now the previous and current state doesnt match and results into
exception.

What kind of locking you are using? i think you should obey some kind of
locking algo so that till the time one server is updating the indexes,
other server should not interfere. Once a server finishes its updation
into the indexes, it should close all writers and reader to release all
the locking.

The alternate solution to this problem is you can create seperate indexes
for each server, this will help because only one thread will be updating
the indexes so there wont be any problem.

Cheers,
Neeraj




Patrick Kimber [EMAIL PROTECTED]

07/03/2007 03:47 PM
Please respond to
java-user@lucene.apache.org



To
java-user@lucene.apache.org
cc

Subject
Re: Lucene 2.2, NFS, Lock obtain timed out






Hi

I have added more logging to my test application.  I have two servers
writing to a shared Lucene index on an NFS partition...

Here is the logging from one server...

[10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
[10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
[segments_n]

and the other server (at the same time):

[10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
[10:49:18] [DEBUG] IndexAccessProvider getWriter()
[10:49:18] [ERROR] DocumentCollection update(DocumentData)
com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
document to the index.
[/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
such file or directory)]
at
com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)

I think the exception is being thrown when the IndexWriter is created:
new IndexWriter(directory, false, analyzer, false, deletionPolicy);

I am confused... segments_n should not have been touched for 3 minutes
so why would a new IndexWriter want to read it?

Here is the whole of the stack trace:

com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
document to the index.
[/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
such file or directory)]
 at
com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)
 at
com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:364)
 at
com.company.lucene.IndexUpdate.addDocument(IndexUpdate.java:342)
 at
com.company.lucene.IndexUpdate.update(IndexUpdate.java:67)
 at
com.company.lucene.icm.DocumentCollection.update(DocumentCollection.java:390)
 at lucene.icm.test.Write.add(Write.java:105)
 at lucene.icm.test.Write.run(Write.java:79)
 at lucene.icm.test.Write.main(Write.java:43)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:324)
 at
org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:271)
 at java.lang.Thread.run(Thread.java:534)
Caused by: java.io.FileNotFoundException:
/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
file or directory)
 at java.io.RandomAccessFile.open(Native Method)
 at
java.io.RandomAccessFile.init(RandomAccessFile.java:204)
 at
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:506)
 at
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:536)
 at
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:531)
 at
org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
 at
org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
 at
org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:156)
 at
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
 at
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:573)
 at
com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
 at
com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Patrick Kimber

Hi Michael

I am really pleased we have a potential fix.  I will look out for the patch.

Thanks for your help.

Patrick

On 03/07/07, Michael McCandless [EMAIL PROTECTED] wrote:


Patrick Kimber [EMAIL PROTECTED] wrote:

 I am using the NativeFSLockFactory.  I was hoping this would have
 stopped these errors.

I believe this is not a locking issue and NativeFSLockFactory should
be working correctly over NFS.

 Here is the whole of the stack trace:

 Caused by: java.io.FileNotFoundException:
 /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
 file or directory)
   at java.io.RandomAccessFile.open(Native Method)
   at java.io.RandomAccessFile.init(RandomAccessFile.java:204)
   at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:506)
   at 
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:536)
   at 
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:531)
   at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
   at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
   at 
org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:156)
   at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
   at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:573)
   at 
com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
   at 
com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
   at 
com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
   ... 13 more

OK, indeed the exception is inside IndexFileDeleter's initialization
(this is what I had guessed might be happening).

 I have added more logging to my test application.  I have two servers
 writing to a shared Lucene index on an NFS partition...

 Here is the logging from one server...

 [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
 [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
 [segments_n]

 and the other server (at the same time):

 [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
 [10:49:18] [DEBUG] IndexAccessProvider getWriter()
 [10:49:18] [ERROR] DocumentCollection update(DocumentData)
 com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
 document to the index.
 [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
 such file or directory)]
 at
 com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)

 I think the exception is being thrown when the IndexWriter is created:
 new IndexWriter(directory, false, analyzer, false, deletionPolicy);

 I am confused... segments_n should not have been touched for 3 minutes
 so why would a new IndexWriter want to read it?

Whenever a writer is opeened, it initializes the deleter
(IndexFileDeleter).  During that initialization, we list all files in
the index directory, and for every segments_N file we find, we open it
and incref all index files that it's using.  We then call the
deletion policy's onInit to give it a chance to remove any of these
commit points.

What's happening here is the NFS directory listing is stale and is
reporting that segments_n exists when in fact it doesn't.  This is
almost certainly due to the NFS client's caching (directory listing
caches are in general not coherent for NFS clients, ie, they can lie
for a short period of time, especially in cases like this).

I think this fix is fairly simple: we should catch the
FileNotFoundException and handle that as if the file did not exist.  I
will open a Jira issue  get a patch.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Patrick Kimber

Hi Michael

I have been running the test for over an hour without any problem.
The index writer log file is getting rather large so I cannot leave
the test running overnight.  I will run the test again tomorrow
morning and let you know how it goes.

Thanks again...

Patrick

On 03/07/07, Patrick Kimber [EMAIL PROTECTED] wrote:

Hi Michael

I am setting up the test with the take2 jar and will let you know
the results as soon as I have them.

Thanks for your help

Patrick

On 03/07/07, Michael McCandless [EMAIL PROTECTED] wrote:
 OK I opened issue LUCENE-948, and attached a patch  new 2.2.0 JAR.
 Please make sure you use the take2 versions (they have added
 instrumentation to help us debug):

 https://issues.apache.org/jira/browse/LUCENE-948

 Patrick, could you please test the above take2 JAR?  Could you also call
 IndexWriter.setDefaultInfoStream(...) and capture all output from both
 machines (it will produce quite a bit of output).

 However: I'm now concerned about another potential impact of stale
 directory listing caches, specifically that the writer on the 2nd
 machine will not see the current segments_N file written by the first
 machine and will incorrectly remove the newly created files.

 I think that take2 JAR should at least resolve this
 FileNotFoundException but I think likely you are about to hit this new
 issue.

 Mike

 Patrick Kimber [EMAIL PROTECTED] wrote:
  Hi Michael
 
  I am really pleased we have a potential fix.  I will look out for the
  patch.
 
  Thanks for your help.
 
  Patrick
 
  On 03/07/07, Michael McCandless [EMAIL PROTECTED] wrote:
  
   Patrick Kimber [EMAIL PROTECTED] wrote:
  
I am using the NativeFSLockFactory.  I was hoping this would have
stopped these errors.
  
   I believe this is not a locking issue and NativeFSLockFactory should
   be working correctly over NFS.
  
Here is the whole of the stack trace:
   
Caused by: java.io.FileNotFoundException:
/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
file or directory)
  at java.io.RandomAccessFile.open(Native Method)
  at java.io.RandomAccessFile.init(RandomAccessFile.java:204)
  at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:506)
  at 
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:536)
  at 
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:531)
  at 
org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
  at 
org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
  at 
org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:156)
  at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
  at 
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:573)
  at 
com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
  at 
com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
  at 
com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
  ... 13 more
  
   OK, indeed the exception is inside IndexFileDeleter's initialization
   (this is what I had guessed might be happening).
  
I have added more logging to my test application.  I have two servers
writing to a shared Lucene index on an NFS partition...
   
Here is the logging from one server...
   
[10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
[10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
[segments_n]
   
and the other server (at the same time):
   
[10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
[10:49:18] [DEBUG] IndexAccessProvider getWriter()
[10:49:18] [ERROR] DocumentCollection update(DocumentData)
com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
document to the index.
[/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
such file or directory)]
at

com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)
   
I think the exception is being thrown when the IndexWriter is created:
new IndexWriter(directory, false, analyzer, false, deletionPolicy);
   
I am confused... segments_n should not have been touched for 3 minutes
so why would a new IndexWriter want to read it?
  
   Whenever a writer is opeened, it initializes the deleter
   (IndexFileDeleter).  During that initialization, we list all files in
   the index directory, and for every segments_N file we find, we open it
   and incref all index files that it's using.  We then call the
   deletion policy's onInit to give it a chance to remove any of these
   commit points.
  
   What's happening here is the NFS directory listing is stale and is
   reporting that segments_n exists when in fact it doesn't.  This is
   almost

Lucene 2.2, NFS, Lock obtain timed out

2007-06-29 Thread Patrick Kimber

Hi,

We are sharing a Lucene index in a Linux cluster over an NFS share.  We have
multiple servers reading and writing to the index.

I am getting regular lock exceptions e.g.
Lock obtain timed out:
NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-n-write.lock

- We are using Lucene 2.2.0
- We are using kernel NFS and lockd is running.
- We are using a modified version of the ExpirationTimeDeletionPolicy
found in the
 Lucene test suite:
http://svn.apache.org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy.java
 I have set the expiration time to 600 seconds (10 minutes).
- We are using the NativeFSLockFactory with the lock folder being
within the index
 folder:
 /mnt/nfstest/repository/lucene/lock/
- I have implemented a handler which will pause and retry an update or delete
 operation if a LockObtainFailedException or StaleReaderException is
caught.  The
 handler will retry the update or delete once every second for 1 minute before
 re-throwing the exception and aborting.

The issue appears to be caused by a lock file which is not deleted.
The handlers
keep retrying... the process holding the lock eventually aborts...
this deletes the
lock file - any applications still running then continue normally.

The application does not throw these exceptions when it is run on a
standard Linux
file system or Windows workstation.

I would really appreciate some help with this issue.  The chances are I am doing
something stupid... but I cannot think what to try next.

Thanks for your help

Patrick

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2, NFS, Lock obtain timed out

2007-06-29 Thread Patrick Kimber

Hi Doron

Thanks for your reply.

I am working on the details of the update pattern.  It will take me
some time as I cannot reproduce the issue on demand.

To answer your other questions, yes, we do have multiple writers.  One
writer per node in the cluster.

I will post the results of my investigations as soon as possible.

Thanks for your help

Patrick



On 29/06/07, Doron Cohen [EMAIL PROTECTED] wrote:

hi Patrick,

Mike is the expert in this, but until he gets in, can you add details on
the update pattern - note that the DeletionPolicy you describe below is not
(afaik) related to the write lock time-out issues you are facing. The
DeletionPolicy manages better the interaction between an IndexWriter that
deletes old files, and an IndexReader that might still use this file. The
write lock, on the hand, just synchronizes between multiple IndexWriter
objects attempting to open the same index for write. So, do you have
multiple writers? Can you print/describe the writers timing scenario when
this time-out problem occur, e.g, something like this
 w1.open
 w1.modify
 w1.close
 w2.open
 w2.modify
 w2.close
 w3.open
 w3.modify
 w3.close
 w2.open .  time-out... but w3 closed the index so the
lock-file was supposed to be removed, why wasn't it?
Can write attempt come from different nodes in the cluster?
Can you make sure that when the writer gets the lock time-out there is
indeed no other active writer?

Doron

Patrick Kimber [EMAIL PROTECTED] wrote on 29/06/2007
02:01:08:

 Hi,

 We are sharing a Lucene index in a Linux cluster over an NFS
 share.  We have
 multiple servers reading and writing to the index.

 I am getting regular lock exceptions e.g.
 Lock obtain timed out:

NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-

 n-write.lock

 - We are using Lucene 2.2.0
 - We are using kernel NFS and lockd is running.
 - We are using a modified version of the ExpirationTimeDeletionPolicy
 found in the
   Lucene test suite:
 http://svn.apache.

org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy.

 java
   I have set the expiration time to 600 seconds (10 minutes).
 - We are using the NativeFSLockFactory with the lock folder being
 within the index
   folder:
   /mnt/nfstest/repository/lucene/lock/
 - I have implemented a handler which will pause and retry an
 update or delete
   operation if a LockObtainFailedException or StaleReaderException is
 caught.  The
   handler will retry the update or delete once every second for
 1 minute before
   re-throwing the exception and aborting.

 The issue appears to be caused by a lock file which is not deleted.
 The handlers
 keep retrying... the process holding the lock eventually aborts...
 this deletes the
 lock file - any applications still running then continue normally.

 The application does not throw these exceptions when it is run on a
 standard Linux
 file system or Windows workstation.

 I would really appreciate some help with this issue.  The
 chances are I am doing
 something stupid... but I cannot think what to try next.

 Thanks for your help

 Patrick

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2, NFS, Lock obtain timed out

2007-06-29 Thread Patrick Kimber

Hi

As requested, I have been trying to improve the logging in the
application so I can give you more details of the update pattern.

I am using the Lucene Index Accessor contribution to co-ordinate the
readers and writers:
http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049

If the close method, in the IndexAccessProvider, fails the exception
is logged but not re-thrown:
public void close(IndexReader reader) {
 if (reader != null) {
   try {
 reader.close();
   } catch (IOException e) {
 log.error(, e);
   }
 }
}

I have been checking the application log.  Just before the time when
the lock file errors occur I found this log entry:
[11:28:59] [ERROR] IndexAccessProvider
java.io.FileNotFoundException:
/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 (No
such file or directory)
at java.io.RandomAccessFile.open(Native Method)

- I guess the missing segments file could result in the lock file not
being removed?
- Is it safe to ignore this exception (probably not)?
- Why would the segments file be missing?  Could this be connected to
the NFS issues in some way?

Thanks for your help

Patrick


On 29/06/07, Patrick Kimber [EMAIL PROTECTED] wrote:

Hi Doron

Thanks for your reply.

I am working on the details of the update pattern.  It will take me
some time as I cannot reproduce the issue on demand.

To answer your other questions, yes, we do have multiple writers.  One
writer per node in the cluster.

I will post the results of my investigations as soon as possible.

Thanks for your help

Patrick



On 29/06/07, Doron Cohen [EMAIL PROTECTED] wrote:
 hi Patrick,

 Mike is the expert in this, but until he gets in, can you add details on
 the update pattern - note that the DeletionPolicy you describe below is not
 (afaik) related to the write lock time-out issues you are facing. The
 DeletionPolicy manages better the interaction between an IndexWriter that
 deletes old files, and an IndexReader that might still use this file. The
 write lock, on the hand, just synchronizes between multiple IndexWriter
 objects attempting to open the same index for write. So, do you have
 multiple writers? Can you print/describe the writers timing scenario when
 this time-out problem occur, e.g, something like this
  w1.open
  w1.modify
  w1.close
  w2.open
  w2.modify
  w2.close
  w3.open
  w3.modify
  w3.close
  w2.open .  time-out... but w3 closed the index so the
 lock-file was supposed to be removed, why wasn't it?
 Can write attempt come from different nodes in the cluster?
 Can you make sure that when the writer gets the lock time-out there is
 indeed no other active writer?

 Doron

 Patrick Kimber [EMAIL PROTECTED] wrote on 29/06/2007
 02:01:08:

  Hi,
 
  We are sharing a Lucene index in a Linux cluster over an NFS
  share.  We have
  multiple servers reading and writing to the index.
 
  I am getting regular lock exceptions e.g.
  Lock obtain timed out:
 
 
NativeFSLock@/mnt/nfstest/repository/lucene/lock/lucene-2d3d31fa7f19eabb73d692df44087d81-

  n-write.lock
 
  - We are using Lucene 2.2.0
  - We are using kernel NFS and lockd is running.
  - We are using a modified version of the ExpirationTimeDeletionPolicy
  found in the
Lucene test suite:
  http://svn.apache.
 
 
org/repos/asf/lucene/java/trunk/src/test/org/apache/lucene/index/TestDeletionPolicy.

  java
I have set the expiration time to 600 seconds (10 minutes).
  - We are using the NativeFSLockFactory with the lock folder being
  within the index
folder:
/mnt/nfstest/repository/lucene/lock/
  - I have implemented a handler which will pause and retry an
  update or delete
operation if a LockObtainFailedException or StaleReaderException is
  caught.  The
handler will retry the update or delete once every second for
  1 minute before
re-throwing the exception and aborting.
 
  The issue appears to be caused by a lock file which is not deleted.
  The handlers
  keep retrying... the process holding the lock eventually aborts...
  this deletes the
  lock file - any applications still running then continue normally.
 
  The application does not throw these exceptions when it is run on a
  standard Linux
  file system or Windows workstation.
 
  I would really appreciate some help with this issue.  The
  chances are I am doing
  something stupid... but I cannot think what to try next.
 
  Thanks for your help
 
  Patrick
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2, NFS, Lock obtain timed out

2007-06-29 Thread Patrick Kimber

Hi Mark

Yes, thank you.  I can see your point and I think we might have to pay
some attention to this issue.

But, we sometimes see this error on an NFS share within 2 minutes of
starting the test so I don't think this is the only problem.

Once again, thanks for the idea.  I will certainly be looking to
modify the code in the LuceneIndexAccessor to take this into account.

Patrick

On 29/06/07, Mark Miller [EMAIL PROTECTED] wrote:

This is an interesting choice. Perhaps you have modified
LuceneIndexAccessor, but it seems to me (without knowing much about your
setup) that you would have odd reader behavior. On a 3 node system, if you
add docs with node 1 and 2 but not 3 and your doing searches against all 3
nodes, node 3 will have old readers opened until you add a doc to node 3.
This is an odd consistency issue (node 1 and 2 have current views because
you are adding docs to them, but node 3 will be stale until it gets a doc),
but also if you keep adding docs to node 1 and 2, or just plain add no docs
to node 3, won't node 3's reader's index files be pulled out from under it
after 10 minutes? Node 3 (or 1 and 2 for that matter) will not give up its
cached readers *until* you add a doc with that particular node.

Perhaps I am all wet on this (I havn't used NFS with Lucene), but I think
you may need to somehow coordinate the delete policy with the
LuceneIndexAccessor on each node.

This may be unrelated to your problem,and perhaps you get around the issue
somehow, but just to throw it out there...

- Mark

On 6/29/07, Patrick Kimber  [EMAIL PROTECTED] wrote:



 I am using the Lucene Index Accessor contribution to co-ordinate the
 readers and writers:

 
http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene 2.2, NFS, Lock obtain timed out

2007-06-29 Thread Patrick Kimber

Hi Mark

I just ran my test again... and the error occurred after 10 minutes -
which is the time when my deletion policy is triggered.  So... I think
you might have found the answer to my problem.

I will spend more time looking at it on Monday.

Thank you very much for your help and enjoy your weekend.

Patrick

On 29/06/07, Mark Miller [EMAIL PROTECTED] wrote:

If your getting java.io.FileNotFoundException:
/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_h75 within 2
minutes, this is very odd indeed. That would seem to imply your deletion
policy is not working.

You might try just using one of the nodes as the writer. In Michaels
comments, he always seems to mention the pattern of one writer many
readers on nfs. In this case you could use no LockFactory and perhaps
gain a little speed there.

- Mark

Patrick Kimber wrote:
 Hi Mark

 Yes, thank you.  I can see your point and I think we might have to pay
 some attention to this issue.

 But, we sometimes see this error on an NFS share within 2 minutes of
 starting the test so I don't think this is the only problem.

 Once again, thanks for the idea.  I will certainly be looking to
 modify the code in the LuceneIndexAccessor to take this into account.

 Patrick

 On 29/06/07, Mark Miller [EMAIL PROTECTED] wrote:
 This is an interesting choice. Perhaps you have modified
 LuceneIndexAccessor, but it seems to me (without knowing much about your
 setup) that you would have odd reader behavior. On a 3 node system,
 if you
 add docs with node 1 and 2 but not 3 and your doing searches against
 all 3
 nodes, node 3 will have old readers opened until you add a doc to
 node 3.
 This is an odd consistency issue (node 1 and 2 have current views
 because
 you are adding docs to them, but node 3 will be stale until it gets a
 doc),
 but also if you keep adding docs to node 1 and 2, or just plain add
 no docs
 to node 3, won't node 3's reader's index files be pulled out from
 under it
 after 10 minutes? Node 3 (or 1 and 2 for that matter) will not give
 up its
 cached readers *until* you add a doc with that particular node.

 Perhaps I am all wet on this (I havn't used NFS with Lucene), but I
 think
 you may need to somehow coordinate the delete policy with the
 LuceneIndexAccessor on each node.

 This may be unrelated to your problem,and perhaps you get around the
 issue
 somehow, but just to throw it out there...

 - Mark

 On 6/29/07, Patrick Kimber  [EMAIL PROTECTED] wrote:
 
 
 
  I am using the Lucene Index Accessor contribution to co-ordinate the
  readers and writers:
 
 
 
http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049

 
 


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene indexing pdf

2006-06-27 Thread Patrick Kimber

Hi Teresa

You need to convert the pdf file into text format before adding the
text to the Lucene index.

You may like to look at http://www.pdfbox.org/ for a library to
convert pdf files to text format.

Patrick

On 27/06/06, mcarcelen [EMAIL PROTECTED] wrote:

Hi,
I´m new with Lucene and I´m trying to index a pdf but when I query
everything it returns nothing. Can anyone help me?
Thans a lot
Teresa


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: similar ArrayIndexOutOfBoundsException on searching and optimizing

2006-05-23 Thread Patrick Kimber

Hi Adam

We are getting the same error.  Did you manage to work out what was
causing the problem?

Thanks
Patrick

On 21/04/06, Adam Constabaris [EMAIL PROTECTED] wrote:

This is a puzzler, I'm not sure if I'm doing something wrong or whether
I have a poisoned document, a corrupted index (failing to close my
IndexModifier properly?) or what.  The setup is this: I have two
processes (the backend and frontend of a CMS) that run in two different
VMs -- both use Lucene 1.9.1 with the PorterStemmerAnalyzer wrapper over
the StandardAnalyzer (from lucene-memory AnalyzerUtils).

The backend is responsible for index creation, updates, etc., while the
frontend process uses the created index.  What's puzzling is that some
queries will die with an ArrayIndexOutOfBoundsException being thrown out
of the BitVector class:

Caused by: java.lang.ArrayIndexOutOfBoundsException: 240
 at org.apache.lucene.util.BitVector.get(BitVector.java:63)
 at
org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:133)
 at org.apache.lucene.search.TermScorer.next(TermScorer.java:105)
 at
org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:151)
 at
org.apache.lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:125)
 at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:290)
 at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
  at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:99)
 at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
 at org.apache.lucene.search.Hits.init(Hits.java:44)
 at org.apache.lucene.search.Searcher.search(Searcher.java:44)
 at org.apache.lucene.search.Searcher.search(Searcher.java:36)

The only pattern I've been able to discern in queries that cause this
problem is that (a) they search the contents field (tokenized,
unstored, TermVector.YES), and (b) it *seems* that it mostly happens
with longer terms in the query.  Although the frontend defaults to a
multifield query, the same happens when I use contents:term and
does not happen if I specify term and any other of the default
fields used by the MultiFieldQueryParser.

Here's where it gets interesting: I've noticed that calling optimize()
on the index as it's created by the server process is also throwing a
hissy fit, with an *eerily similar* index:

java.lang.ArrayIndexOutOfBoundsException: 239
 at org.apache.lucene.util.BitVector.get(BitVector.java:63)
 at
org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:288)
 at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
 at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
 at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681)
 at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658)
 at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517)
 at
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:553)

Does anybody have any ideas about what I might be doing wrong, or if
I've possibly uncovered a bug?  I'm too new to the scene to know where I
ought to start with this.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How to write to and read from the same index

2006-03-28 Thread Patrick Kimber
Hi Nick

Have you tried the Lucene Index Accessor contribution?

We have a similar update/search pattern and it works very well.

http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049

Patrick

On 28/03/06, Nick Atkins [EMAIL PROTECTED] wrote:
 I'm using Lucene running on Tomcat to index a large amount of email data
 and as the indexer runs through the mailbox creating, merging and
 deleting documents it does lots of searches at the same time to see if
 the document exists.  Actually all my modification operations are done
 in batch every x seconds or so.

 This seems to cause me lots of problems.  It believe it is not possible
 to keep a single Searcher open while the index is being modified so the
 only way is to detect the index changes, close the old one and create a
 new one.  However, doing this causes the number of file handles to grow
 beyond the max allowed by the system.  I have tried using Luc's
 DelayCloseIndexSearcher with his Factory example but as my index is
 modified frequently this causes lots of new DelayCloseIndexSearcher
 objects.  The way it calls close on them when there are no more usages
 doesn't seem to keep the number of file handles down, they just grow.  I
 would expect close to release file handles to the system when nothing is
 using the object (I even set it explicitly to null) but this does not
 happen.

 If this problem makes sense, has anyone else faced it, and does anyone
 have a solution?

 Cheers,

 Nick.

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexSearcher and IndexWriter in conjuction

2006-03-13 Thread Patrick Kimber
Hi Nikhil
We are using the index accessor contribution.  For more information see:
http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049
This should help you to co-ordinate the IndexSearcher and IndexWriter.
Patrick

On 13/03/06, Nikhil Goel [EMAIL PROTECTED] wrote:
 Hi,

 Can someone please explain how does IndexSearcher and IndexWriter works in
 conjuction. As far as i know after reading all the posts in newgroup, it
 seems everything works fine if we have one IndexWriter thread and multiple
 IndexSearcher thread. But my doubt here is, looking at IndexSearcher class,
 it seems it first reads the segments file and then one by one go to the
 respective .fnm files in the index...hence can occur a case, where it has
 read segments file but in the meantime IndexWriter thread has updated the
 index and the corresponding .fnm file doesnt exist in the index and this
 will give us the error .fnm doesn't exist and we will get an IOException.

 Am I missing something in making sure that there can be multiple
 IndexSearcher thread and one IndexWriter Thread and still everything works
 fine.

 thanks
 -Nikhil



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: steps for building lucene 1.9

2006-03-09 Thread Patrick Kimber
Hi Haritha

Hope the following helps:

Build Lucene Core from SVN

Download the lucene Subversion repository from:
http://svn.apache.org/repos/asf/lucene/java/trunk

Note: The CVS repository is still accessible but is out of date.

I downloaded to:
C:\src\lucene-svn\

To build (using ANT):
cd C:\src\lucene-svn\
ant

The following jar file is produced:
C:\src\lucene-svn\build\lucene-core-1.9-rc1-dev.jar

I have just built lucene using these instructions on my workstation
and it builds without any errors.
Patrick

On 09/03/06, Haritha_Parvatham [EMAIL PROTECTED] wrote:
 Hi,
 I have downloaded lucene 1.9 version .please tell me how to build it.Iam
 finding so many errors in lucene 1.9 source code.

 Thanks.
 Haritha

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How to intergrate snowball in lucene

2006-03-06 Thread Patrick Kimber
Hi

You should download the snowball contribution which is in the
SubVersion repository:

http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/snowball

This can be built using ANT.

Patrick

On 06/03/06, Haritha_Parvatham [EMAIL PROTECTED] wrote:
 Hi,
 Can anyone giude me  to intergrate snowball in lucene.
 I have downloaded snowball srcs.But some files are written in 'c'
 language.I have compiled it .
 Pls tell me how i add snowball in lucene for multilingual support.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene, Cannot rename segments.new to segments

2006-02-22 Thread Patrick Kimber
I am getting intermittent errors with Lucene.  Here are two examples:
java.io.IOException: Cannot rename E:\lucene\segments.new to E:\lucene\segments
java.io.IOException: Cannot rename E:\lucene\_8ya.tmp to E:\lucene\_8ya.del

This issue has an open BugZilla entry:
http://issues.apache.org/bugzilla/show_bug.cgi?id=36241

I thought this error must be caused by an error in my application.  To
try and solve the error I used the LuceneIndexAccessor in my
application:
http://issues.apache.org/bugzilla/show_bug.cgi?id=34995

I am still getting the error.

1) Is there a reason (other than time and resource) why the bug report
is still set to NEW after 6 months (since August 2005)?

2) Is the problem likely to be in my application?  Any ideas how I
could go about solving this issue?

Thanks for your help
Patrick

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: OT: how do I connect to the SVN repository to grab the latest source?

2006-01-04 Thread Patrick Kimber
Hi Colin
Did you get some help?

Are you using Windows?  If so, you can install TortoiseSVN which is a
shell extension:
http://tortoisesvn.tigris.org/

If you are using Windows or Linux you can use SmartSVN
http://www.smartcvs.com/smartsvn/

The url for Lucene on SVN is:
http://svn.apache.org/repos/asf/lucene/java/trunk

If you want to learn all about SubVersion (SVN), the best source of
information is the SubVersion book:
http://svnbook.red-bean.com/

Hope this helps
Patrick

On 04/01/06, Colin Young [EMAIL PROTECTED] wrote:
 Normally I wouldn't post this here, but I haven't been able to find any
 info about how I would go about downloading the latest source from the
 SVN repository. I've got a bit of experience with CVS, but I can't even
 figure out where to start with SVN.

 If anyone could point me in the right direction I'd appreciate it (we
 could do it offline to avoid polluting this list any further).

 Thanks

 Colin Young


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



http://www.textmining.org/ is hacked

2005-11-24 Thread Patrick Kimber
Hi
I am trying to download the source code for
tm-extractors-0.4.jar
from
http://www.textmining.org/

Looks like the site has been hacked.
Does anyone know the location of the CVS or SVN repository?
Thanks for your help...
Pat

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: http://www.textmining.org/ is hacked

2005-11-24 Thread Patrick Kimber
Thanks for the very quick response.

On 24/11/05, Guilherme Barile [EMAIL PROTECTED] wrote:
 I have it here, uploaded it to rapidshare
 http://rapidshare.de/files/8097202/textmining.zip.html

 c ya


 On Thu, 2005-11-24 at 16:46 +, Patrick Kimber wrote:
  Hi
  I am trying to download the source code for
  tm-extractors-0.4.jar
  from
  http://www.textmining.org/
 
  Looks like the site has been hacked.
  Does anyone know the location of the CVS or SVN repository?
  Thanks for your help...
  Pat
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Deprecated API in BooleanQuery broken in Lucene from CVS?

2005-11-18 Thread Patrick Kimber
Daniel
You are correct.  The latest version from SVN works correctly.
Very confusing - I only checked out Lucene from CVS a few days ago.  I
didn't realise that changes were only being made in the SVN
repository.
Thank you very much for your help.
Regards
Patrick

On 17/11/05, Daniel Naber [EMAIL PROTECTED] wrote:
 On Dienstag 15 November 2005 11:24, Patrick Kimber wrote:

  I have checked out the latest version of Lucene from CVS and have
  found a change in the results compared to version 1.4.3.

 Lucene isn't in CVS anymore, it's in SVN. With the latest version from SVN,
 I cannot reproduce your problem.

 Regards
  Daniel

 --
 http://www.danielnaber.de

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]