[ 
https://issues.apache.org/jira/browse/LUCENE-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5438:
---------------------------------------
    Attachment: LUCENE-5438.patch

Here's the latest applyable patch from the branch.  Tests patch but
not yet precommit... I'll work on it.

I tried to keep the core changes to a minimum (simplified vs previous
iterations), but there were some additions that NRT replication needs,
like asking IW to write deletes to disk on opening the NRT reader.
The changes to SegmentInfos.java are not as scary as they look (just
factoring out methods to read/write from {{IndexInput/Output}} too).

I've marked the new APIs experimental or internal, and put all
the new classes under o.a.l.replication.nrt.

The important classes are {{PrimaryNode}} (you create this on the JVM
that will index documents) and {{ReplicaNode}} (you create that on
other JVMs to receive newly flushed files).  They are both abstract:
you must subclass and implement methods that actually do the work of
moving files, etc.  The tests do this using a simple TCP socket
server.

Both {{PrimaryNode}} and {{ReplicaNode}} expose a {{SearcherManager}},
which you use for searching.  They both have {{commit}} methods.

The primary node uses a merged segment warmer that pre-copies merged
files before letting the local IW cutover.  This way NRT latency isn't
blocked by copying merged files out (normally).  An alternative to
this would be to have the replicas do their own merging, but I think
that gets quite complex.


> add near-real-time replication
> ------------------------------
>
>                 Key: LUCENE-5438
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5438
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/replicator
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 6.0
>
>         Attachments: LUCENE-5438.patch, LUCENE-5438.patch, LUCENE-5438.patch, 
> LUCENE-5438.patch
>
>
> Lucene's replication module makes it easy to incrementally sync index
> changes from a master index to any number of replicas, and it
> handles/abstracts all the underlying complexity of holding a
> time-expiring snapshot, finding which files need copying, syncing more
> than one index (e.g., taxo + index), etc.
> But today you must first commit on the master, and then again the
> replica's copied files are fsync'd, because the code operates on
> commit points.  But this isn't "technically" necessary, and it mixes
> up durability and fast turnaround time.
> Long ago we added near-real-time readers to Lucene, for the same
> reason: you shouldn't have to commit just to see the new index
> changes.
> I think we should do the same for replication: allow the new segments
> to be copied out to replica(s), and new NRT readers to be opened, to
> fully decouple committing from visibility.  This way apps can then
> separately choose when to replicate (for freshness), and when to
> commit (for durability).
> I think for some apps this could be a compelling alternative to the
> "re-index all documents on each shard" approach that Solr Cloud /
> ElasticSearch implement today, and it may also mean that the
> transaction log can remain external to / above the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to