TeeSinkCodec and FilteringCodec
-------------------------------
Key: LUCENE-2632
URL: https://issues.apache.org/jira/browse/LUCENE-2632
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Affects Versions: 4.0
Reporter: Andrzej Bialecki
This issue adds two new Codec implementations:
* TeeSinkCodec: there have been attempts in the past to implement parallel
writing to multiple indexes so that they are all synchronized. This was however
complicated due to the complexity of IndexWriter/SegmentMerger logic. The
solution presented here offers a similar functionality but working on a
different level - as the name suggests, the TeeSinkCodec duplicates term data
into multiple output Directories, and provides a multi-directory abstraction to
perform other operations that are not yet handled by the Codec API (e.g. stored
fields handling).
* FilteringCodec is related in a remote way to the ideas of index pruning
presented in LUCENE-1812 and the concept of tiered search. Since we can use
TeeSinkCodec to write to multiple output Directories in a synchronized way, we
could also filter out or modify some of the data that is being written. The
FilteringCodec provides this functionality, so that you can use like this:
{code}
IndexWriter --> TeeSinkCodec
| |
| +--> StandardCodec --> Directory1
+--> FilteringCodec --> StandardCodec --> Directory2
{code}
The end result of this chain is two indexes that are kept in sync - one is the
full regular index, and the other one is a filtered index.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]