[
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647655#comment-13647655
]
Shai Erera commented on LUCENE-4975:
------------------------------------
So here's an overview how the Replicator works (it's also document under
oal.replicator.package.html):
At a high-level, producers (e.g. indexer) publish Revisions, and consumers
update to the latest Revision available. Like SVN, if a client is on rev1 and
the server has rev4, the next update request will upgrade the client to rev4,
skipping all intermediate revisions.
The Replicator offers two implementations at the moment: LocalReplicator to be
used by at the server side and HttpReplicator to be used by clients to e.g.
update over HTTP. In the future, we may want to add other Replicator
implementations, e.g. rsync, torrent... for HTTP, the package also provides a
ReplicationService which acts on the Http servlet request/response following
some API specification. In that sense, the HttpReplicator expects a certain
HTTP impl on the server side, so ReplicationService helps you by implementation
that API. The reason it's not a servlet is so that you can plug it into your
application servlet freely.
A Revision is basically a list of files and sources. For example, IndexRevision
contains the list of files in an IndexCommit (and only one source), while
IndexAndTaxonomyRevision contains the list of files from both IndexCommits with
corresponding sources (index/taxonomy). When the server publishes either of
these two revision, the IndexCommits are snapshotted so that files aren't
deleted, and the Replicator serves file requests (by clients) from the
Revision. The Revision is also responsible for releasing itself -- this is done
automatically by the Replicator which releases a revision when it's no longer
needed (i.e. there's a new one already) and there are no clients that currently
replicate its files.
On the client side, the package offers a ReplicationClient class which can be
invoked either manually, or start its update-thread to periodically check for
updates. The client is given a ReplicationHandler (two matching
implementations: IndexReplicationHandler and
IndexAndTaxonomyReplicationHandler) which is responsible to act on the
replicated files. The client first obtains all needed files (i.e. those that
the new Revision offers, and the client is still missing), and after they were
all successfully copied over, the handler is invoked. Both handlers copy the
files from their temporary location to the index directories, fsync them and
kiss the index such that unused files are deleted. You can provide each handler
a Callable which is invoked after the index has been safely and successfully
updated, so you can e.g. searcherManager.maybeReopen().
Here's a general code example that explains how to work with the Replicator:
{code}
// ++++++++++++++ SERVER SIDE ++++++++++++++ //
IndexWriter publishWriter; // the writer used for indexing
Replicator replicator = new LocalReplicator();
replicator.publish(new IndexRevision(publishWriter));
// ++++++++++++++ CLIENT SIDE ++++++++++++++ //
// either LocalReplictor, or HttpReplicator if client and server are on
different nodes
Replicator replicator;
// callback invoked after handler finished handling the revision and e.g. can
reopen the reader.
Callable<Boolean> callback = null; // can also be null if no callback is
needed
ReplicationHandler handler = new IndexReplicationHandler(indexDir, callback);
SourceDirectoryFactory factory = new PerSessionDirectoryFactory(workDir);
ReplicationClient client = new ReplicationClient(replicator, handler, factory);
// invoke client manually
client.updateNow();
// or, periodically
client.startUpdateThread(100); // check for update every 100 milliseconds
{code}
The package of course comes with unit tests, though I'm sure there's room for
improvement (there always is!).
> Add Replication module to Lucene
> --------------------------------
>
> Key: LUCENE-4975
> URL: https://issues.apache.org/jira/browse/LUCENE-4975
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Shai Erera
> Assignee: Shai Erera
>
> I wrote a replication module which I think will be useful to Lucene users who
> want to replicate their indexes for e.g high-availability, taking hot backups
> etc.
> I will upload a patch soon where I'll describe in general how it works.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]