[jira] [Comment Edited] (LUCENE-2632) FilteringCodec, TeeCodec, TeeDirectory

Shai Erera (JIRA) Fri, 28 Sep 2012 05:20:14 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465131#comment-13465131
 ]


Shai Erera edited comment on LUCENE-2632 at 9/28/12 11:18 PM:
--------------------------------------------------------------

Patch includes FilteringCodec only files. I've fixed some minor issues such as 
license docs.

About the \*.impl package, I think that if all classes were under \*.filtering, 
we could make all but FilteringCodec, WriteFilter and Noop* classes 
package-private, as everything seems to be controlled by WriteFilter. What do 
you think?

Anyway, this isolated patch is cleaner and so now perhaps we can think of a 
different design, such as move WriteFilter functionality to the different 
Formats/Consumers and let users override that by using FilterCodec over 
FilteringCodec and providing their own Consumer/Formats. After all, WriteFilter 
by default doesn't filter anything ...

And now that we have FilterCodec, perhaps we should rename FilteringCodec to 
something else, like IndexFilteringCodec, or DataFilteringCodec ... make it 
more distinguishable than FilterCodec.

Comments are welcome.
                
      was (Author: shaie):
    Patch includes FilteringCodec only files. I've fixed some minor issues such 
as license docs.

About the *.impl package, I think that if all classes were under *.filtering, 
we could make all but FilteringCodec, WriteFilter and Noop* classes 
package-private, as everything seems to be controlled by WriteFilter. What do 
you think?

Anyway, this isolated patch is cleaner and so now perhaps we can think of a 
different design, such as move WriteFilter functionality to the different 
Formats/Consumers and let users override that by using FilterCodec over 
FilteringCodec and providing their own Consumer/Formats. After all, WriteFilter 
by default doesn't filter anything ...

And now that we have FilterCodec, perhaps we should rename FilteringCodec to 
something else, like IndexFilteringCodec, or DataFilteringCodec ... make it 
more distinguishable than FilterCodec.

Comments are welcome.
                  
> FilteringCodec, TeeCodec, TeeDirectory
> --------------------------------------
>
>                 Key: LUCENE-2632
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2632
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/index
>    Affects Versions: 4.0-ALPHA
>            Reporter: Andrzej Bialecki 
>         Attachments: LUCENE-2632-filtering.patch, LUCENE-2632.patch, 
> LUCENE-2632.patch, LUCENE-2632.patch, LUCENE-2632.patch, LUCENE-2632.patch, 
> LUCENE-2632-trunk.patch
>
>
> This issue adds two new Codec implementations:
> * TeeCodec: there have been attempts in the past to implement parallel 
> writing to multiple indexes so that they are all synchronized. This was 
> however complicated due to the complexity of IndexWriter/SegmentMerger logic. 
> The solution presented here offers a similar functionality but working on a 
> different level - as the name suggests, the TeeCodec duplicates index data 
> into multiple output Directories.
> * TeeDirectory (used also in TeeCodec) is a simple abstraction to perform 
> Directory operations on several directories in parallel (effectively 
> mirroring their data). Optionally it's possible to specify a set of suffixes 
> of files that should be mirrored so that non-matching files are skipped.
> * FilteringCodec is related in a remote way to the ideas of index pruning 
> presented in LUCENE-1812 and the concept of tiered search. Since we can use 
> TeeCodec to write to multiple output Directories in a synchronized way, we 
> could also filter out or modify some of the data that is being written. The 
> FilteringCodec provides this functionality, so that you can use like this:
> {code}
> IndexWriter --> TeeCodec
>                  |  |
>                  |  +--> StandardCodec --> Directory1
>                  +--> FilteringCodec --> StandardCodec --> Directory2
> {code}
> The end result of this chain is two indexes that are kept in sync - one is 
> the full regular index, and the other one is a filtered index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-2632) FilteringCodec, TeeCodec, TeeDirectory

Reply via email to