David Arthur created LUCENE-4731:
------------------------------------

             Summary: New ReplicatingDirectory mirrors index files to HDFS
                 Key: LUCENE-4731
                 URL: https://issues.apache.org/jira/browse/LUCENE-4731
             Project: Lucene - Core
          Issue Type: New Feature
          Components: core/store
            Reporter: David Arthur
             Fix For: 4.2, 5.0
         Attachments: ReplicatingDirectory.java

I've been working on a Directory implementation that mirrors the index files to 
HDFS (or other Hadoop supported FileSystem).

A ReplicatingDirectory delegates all calls to an underlying Directory (supplied 
in the constructor). The only hooks are the deleteFile and sync calls. We 
submit deletes and replications to a single scheduler thread to keep things 
serializer. During a sync call, if "segments.gen" is seen in the list of files, 
we know a commit is finishing. After calling the deletage's sync method, we 
initialize an asynchronous replication as follows.

* Read segments.gen (before leaving ReplicatingDirectory#sync), save the values 
for later
* Get a list of local files from ReplicatingDirectory#listAll before leaving 
ReplicatingDirectory#sync
* Submit replication task (DirectoryReplicator) to scheduler thread
* Compare local files to remote files, determine which remote files get 
deleted, and which need to get copied
* Submit a thread to copy each file (one thead per file)
* Submit a thread to delete each file (one thead per file)
* Submit a "finalizer" thread. This thread waits on the previous two batches of 
threads to finish. Once finished, this thread generates a new "segments.gen" 
remotely (using the version and generation number previously read in).

I have no idea where this would belong in the Lucene project, so i'll just 
attach the standalone class instead of a patch. It introduces dependencies on 
Hadoop core (and all the deps that brings with it).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to