David Arthur created LUCENE-4731:
------------------------------------
Summary: New ReplicatingDirectory mirrors index files to HDFS
Key: LUCENE-4731
URL: https://issues.apache.org/jira/browse/LUCENE-4731
Project: Lucene - Core
Issue Type: New Feature
Components: core/store
Reporter: David Arthur
Fix For: 4.2, 5.0
Attachments: ReplicatingDirectory.java
I've been working on a Directory implementation that mirrors the index files to
HDFS (or other Hadoop supported FileSystem).
A ReplicatingDirectory delegates all calls to an underlying Directory (supplied
in the constructor). The only hooks are the deleteFile and sync calls. We
submit deletes and replications to a single scheduler thread to keep things
serializer. During a sync call, if "segments.gen" is seen in the list of files,
we know a commit is finishing. After calling the deletage's sync method, we
initialize an asynchronous replication as follows.
* Read segments.gen (before leaving ReplicatingDirectory#sync), save the values
for later
* Get a list of local files from ReplicatingDirectory#listAll before leaving
ReplicatingDirectory#sync
* Submit replication task (DirectoryReplicator) to scheduler thread
* Compare local files to remote files, determine which remote files get
deleted, and which need to get copied
* Submit a thread to copy each file (one thead per file)
* Submit a thread to delete each file (one thead per file)
* Submit a "finalizer" thread. This thread waits on the previous two batches of
threads to finish. Once finished, this thread generates a new "segments.gen"
remotely (using the version and generation number previously read in).
I have no idea where this would belong in the Lucene project, so i'll just
attach the standalone class instead of a patch. It introduces dependencies on
Hadoop core (and all the deps that brings with it).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]