[jira] [Commented] (LUCENE-2309) Fully decouple IndexWriter from analyzers

Chris Male (JIRA) Sun, 17 Jul 2011 09:37:23 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066688#comment-13066688
 ]


Chris Male commented on LUCENE-2309:
------------------------------------

{code}
I think if we want this to take AttributesConsumer or whatever, then thats 
cool, Analyzer returns this instead of TokenStream and we fix all these 
consumers to consume the more general API.
{code}

I just don't see how this would work.  As it is in the patch, AttributeConsumer 
is a callback mechanism where the consumer provides their logic.  Its nothing 
to do with Analyzers really and will be implemented differently depending on 
what the consumer wants to do in that instance.

{code}
I just want to make sure, that all consumers, not just IndexWriter, use the 
consistent API. This way, like today, someone declares FooAnalyzer, uses it 
everywhere, and stuff is consistent everywhere.
{code}

Absolutely desirable.  AttributeConsumer isn't changing the Analyzer concept, 
its just changing how we consume from Analyzer.  With that in mind, I very much 
agree with your assertion that this shouldn't change the Analyzer used in 
search and indexing.  Whats prompted that concern here is the shift to per 
Field Analyzer.  I'll reassess that change while waiting for other feedback. 

> Fully decouple IndexWriter from analyzers
> -----------------------------------------
>
>                 Key: LUCENE-2309
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2309
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: LUCENE-2309.patch
>
>
> IndexWriter only needs an AttributeSource to do indexing.
> Yet, today, it interacts with Field instances, holds a private
> analyzers, invokes analyzer.reusableTokenStream, has to deal with a
> wide variety (it's not analyzed; it is analyzed but it's a Reader,
> String; it's pre-analyzed).
> I'd like to have IW only interact with attr sources that already
> arrived with the fields.  This would be a powerful decoupling -- it
> means others are free to make their own attr sources.
> They need not even use any of Lucene's analysis impls; eg they can
> integrate to other things like [OpenPipeline|http://www.openpipeline.org].
> Or make something completely custom.
> LUCENE-2302 is already a big step towards this: it makes IW agnostic
> about which attr is "the term", and only requires that it provide a
> BytesRef (for flex).
> Then I think LUCENE-2308 would get us most of the remaining way -- ie, if the
> FieldType knows the analyzer to use, then we could simply create a
> getAttrSource() method (say) on it and move all the logic IW has today
> onto there.  (We'd still need existing IW code for back-compat).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-2309) Fully decouple IndexWriter from analyzers

Reply via email to