Does Lucene.net allow you to set an infoStream on the writer, so that
it gives details about when it's merging, committing, deleting, etc?
If so, can you capture & post that?
Do you know when the segments file disappears? Is it while an
IndexWriter is open, or, on closing the IndexWriter?
Are you using IndexReader to do deletions?
Are you using a custom deletion policy?
Mike
adakos wrote:
Hello everyone!
I have implemented Lucene .Net, and have had it functioning very
well for
quite some time.
I have an indexing server application that indexes a very large file
system,
and a searching application that the users use to search the index
created
by the server application.
Our index is currently ~4 GB s in size, and we have roughly 1/2
million
documents that we are indexing/updating regularly.
As of late, we have been having a problem with the segments file
disappearing.
I originally thought it was the indexing server crashing or
encountering
some kind of error, but this wasn't the case, I ran the program in
debug
mode and found that the indexing server itself didn't fault at any
point,
and the indexing program itself ran into the problem of not being
able to
find the segment file as well.
Unfortunately every time I run the Indexing program, this problem
occurs.
So as a result of running the indexer we encounter the issue with the
segments file being deleted or disappearing, so the indexer is
causing the
issue, but there doesn't appear to be any reason why.
I have optimised the index and ran the program again and that still
doesn't
help.
All the index writers/readers have appropriately coded .Close()
methods in a
try/catch/finally.
Like I said, the indexer was running perfectly fine for a very long
period
of time. The only thing I can see that's changed since we started
using it
is the index size getting bigger.
Its obviously quite a critical problem because our users can only
search on
the outdated index, and I haven't been able to find anything on this
issue
anywhere.
I am hoping someone might be able to figure out what's going on.
The error that is received is basically when an index writer/reader/
searcher
attempts to open, it reports that it cant find the segments file.
Is there any known issue where this occurs? I know I am using
the .Net
implementation but I would assume that lucene would be quite universal
across different platforms. I have noticed that there doesn't
appear to be
much support for the .Net version, or at least I have not been able
to find
any.
If it helps any, below are the methods used for indexing
Full Index Update
Directories are searched recursively
For each file we check to see if it is already in the index
(comparing size,
modified time, etc)
If it does, then we ignore the file
If it doesn't, we delete the one currently in the index, then add the
updated file
Email Parsing and Check Summing
I also have an email parser, and a check summer that grabs any email
addresses from the document and calculates the checksum of the text to
attempt to avoid duplicate documents. If there is a document with
the same
email and checksum then the document is stored but is marked as a
duplicate.
File System Watcher
Once the full text index is finished, then the indexer begins to
process
files in que that are generated from a file system watcher, the file
system
watcher runs constantly so indexing is done in a live state.
--
View this message in context:
http://www.nabble.com/Segments-file-disappears%2C-index-no-longer-functions.-tp21579880p21579880.html
Sent from the Lucene - General mailing list archive at Nabble.com.