Hello everyone! I have implemented Lucene .Net, and have had it functioning very well for quite some time.
I have an indexing server application that indexes a very large file system, and a searching application that the users use to search the index created by the server application. Our index is currently ~4 GB s in size, and we have roughly 1/2 million documents that we are indexing/updating regularly. As of late, we have been having a problem with the segments file disappearing. I originally thought it was the indexing server crashing or encountering some kind of error, but this wasn't the case, I ran the program in debug mode and found that the indexing server itself didn't fault at any point, and the indexing program itself ran into the problem of not being able to find the segment file as well. Unfortunately every time I run the Indexing program, this problem occurs. So as a result of running the indexer we encounter the issue with the segments file being deleted or disappearing, so the indexer is causing the issue, but there doesn't appear to be any reason why. I have optimised the index and ran the program again and that still doesn't help. All the index writers/readers have appropriately coded .Close() methods in a try/catch/finally. Like I said, the indexer was running perfectly fine for a very long period of time. The only thing I can see that's changed since we started using it is the index size getting bigger. Its obviously quite a critical problem because our users can only search on the outdated index, and I haven't been able to find anything on this issue anywhere. I am hoping someone might be able to figure out what's going on. The error that is received is basically when an index writer/reader/searcher attempts to open, it reports that it cant find the segments file. Is there any known issue where this occurs? I know I am using the .Net implementation but I would assume that lucene would be quite universal across different platforms. I have noticed that there doesn't appear to be much support for the .Net version, or at least I have not been able to find any. If it helps any, below are the methods used for indexing Full Index Update Directories are searched recursively For each file we check to see if it is already in the index (comparing size, modified time, etc) If it does, then we ignore the file If it doesn't, we delete the one currently in the index, then add the updated file Email Parsing and Check Summing I also have an email parser, and a check summer that grabs any email addresses from the document and calculates the checksum of the text to attempt to avoid duplicate documents. If there is a document with the same email and checksum then the document is stored but is marked as a duplicate. File System Watcher Once the full text index is finished, then the indexer begins to process files in que that are generated from a file system watcher, the file system watcher runs constantly so indexing is done in a live state. -- View this message in context: http://www.nabble.com/Segments-file-disappears%2C-index-no-longer-functions.-tp21579880p21579880.html Sent from the Lucene - General mailing list archive at Nabble.com.
