Hello all.

 

I recently ran into a problem where errors during indexing or optimization
(perhaps related to running out of disk space) left me with a working index
in a directory but with additional segment files (partial) that were
unneeded.  The solution for finding the ~40 files to keep out of the ~900
files in the directory amounted to dumping the segments file and noting that
only 5 segments were in fact "live".  The index is a non-compound index
using FSDirectory.

 

Is there (or would it be possible to add (and I'd be willing to submit code
if it made sense to people)) some sort of interrogation on the index of what
files belonged to it?  I looked first as FSDirectory itself thinking that
it's "list()" method should return the subset of index-related files but
looking deeper it looks like Directory is at a lower level abstracting
simple I/O and thus wouldn't "know".

 

So any thoughts?  Would it make sense to have a form of clean on
IndexWriter()?  I hesitate since it seems there isn't a charter that only
Lucene files could exist in the directory thus what is ideal for my
application (since I know I won't mingle other files) might not be ideal for
all.  Would it be fair to look for Lucene known extensions and file naming
signatures to identify unused files that might be failed or dead segments?

 

Thanks,

-George

Reply via email to