Re: Instantiating a RAMDirectory from a mutating directory

Michael McCandless Mon, 09 Mar 2009 16:54:51 -0700


You're welcome, and let us know how it goes!


Mike

Kieran Topping wrote:

Mike, many thanks for this most comprehensive reply.
Actually, I believe that NOTE only applies to the two addIndexesmethods that take Directory. So I think this approach will workfine in general. Have you hit any problems in testing it? I'llupdate the javadocs.
I have not attempted this yet (I was put off by the stark warning!).I'll let you/list know whether I encounter any problems. I'm not /too/ bothered about performance in my particular case. I'm usingRAMDirectory partly for enhanced search-speed, but also partly as ameans to keep many (50+) small-ish (<30MB) indexes open withoutrunning into "too many open files" problems.
If I do encounter any problems, I expect I will look at implementingthe second one of your supplementary suggestions (i.e. using theSegmentInfos class directly), and just keep an eye out for any apichanges between lucene versions.
Many thanks again for your time,

Kieran




Michael McCandless wrote:
This is an interesting challenge!  Responses below...

Kieran Topping wrote:
Hello,
I would like to be able to instantiate a RAMDirectory from adirectory that an IndexWriter in another process might currentlybe modifying.
Ideally, I would like to do this without any synchronizing orlocking. Kind-of like the way in which an IndexReader can open anindex in a directory, even if it's currently being modified by anIndexWriter.
However, simply calling:
RAMDirectory rd = new RAMDirectory("/path/to/index");
Will not work. It will periodically fail with aFileNotFoundException. It's fairly obvious why this happens:Directory.copy() gets a list of the files it needs to copy, andthen copies them into the RAMDirectory instance one-by-one. If, inthe meantime, the IndexWriter deletes one of these files, aFileNotFoundException occurs.
One thought that I had was that I would take advantage of the factthat it's possible to open an IndexReader on the mutatingdirectory, and then use the "addIndexes()" method, as follows:
// 1. create RAMDirectory.
RAMDirectory ramDirectory = new RAMDirectory();
// 2. create an index in the RAMDirectory.
IndexWriter writer = new IndexWriter(ramDirectory, null/*analyzer*/, true /*create*/) ;
// 3. open the (possibly mutating) source index.
IndexReader reader = IndexReader.open("/path/to/index");
// 4. copy the source index into the RAMDirectory index.
writer.addIndexes(new IndexReader [] {reader});
However ... there is a fairly unambiguous warning inIndexWriter.addIndexes()'s documentation:
>> NOTE: the index in each Directory must not be changed (openedby a writer) while this method is running. This method does notacquire a write lock in each input Directory, so it is up to thecaller to enforce this.
I'm slightly confused by this warning though, as IndexReader'sdocumentation implies that it is OK to open an IndexReader in thisfashion.
Actually, I believe that NOTE only applies to the two addIndexesmethods that take Directory. So I think this approach will workfine in general. Have you hit any problems in testing it? I'llupdate the javadocs.
The one big downside to this approach is performance: it's a ratherslow way to copy an index into RAM. But maybe your indexes aresmall enough that this doesn't matter.
I'm wondering whether anyone knows the internals ofIndexWriter.addIndexes() well enough to know whether my proposedsolution will work reliably?
Or, indeed, whether there might be another way of instantiating aRAMDirectory from a directory which might currently be beingmodified by an IndexWriter?
If you could communicate w/ the separate process doing the writing,you could use SnapshotDeletionPolicy (in the writer process) toprotect a particular point-in-time commit. This is exactly how ahot backup of a Lucene index is done; you would have to thencommunicate the filenames that IndexCommit (in the writer process)exposes over to your 2nd reader process, and copy those files, andthen release the snapshot back in the writer process.
Alternatively, you could simply use SegmentInfos class (NOTE: it'spackage private, so you'd need code in org.apache.lucene.indexpackage, and these APIs can change release-to-release) to open thecurrent commit, and then simply copy the files directly (this isthe API that IndexReader.open does). To do this, you shouldsubclass the FindSegmentsFile class, and override run() to open allreferenced files, and probably return these open file handles tothe code that actually does the copying. You'd need to take somecare to handle a FileNotFoundException (meaning you need to retryon the next segments file), to close any files you had succeeded inopening, else you'll leak file descriptors...
Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Instantiating a RAMDirectory from a mutating directory

Reply via email to