[jira] Commented: (LUCENE-2309) Fully decouple IndexWriter from analyzers

Simon Willnauer (JIRA) Fri, 12 Mar 2010 06:45:54 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844523#action_12844523
 ]


Simon Willnauer commented on LUCENE-2309:
-----------------------------------------

bq. Then people could freely use Lucene to index off a foreign analysis chain...
That is what I was talking about!

{quote}
I'd like to donate my two cents here - we've just recently changed the 
TokenStream API, but we still kept its concept - i.e. IW consumes tokens, only 
now the API has changed slightly. The proposals here, w/ the 
AttConsumer/Acceptor, that IW will delegate itself to a Field, so the Field 
will call back to IW seems too much complicated to me. Users that write 
Analyzers/TokenStreams/AttributeSources, should not care how they are 
indexed/stored etc. Forcing them to implement this push logic to IW seems to me 
like a real unnecessary overhead and complexity.
{quote}

We can surely hide this implementation completely from field. I consider this 
being similar to Collector where you pass it explicitly to the search method if 
you want to have a different behavior. Maybe something like a 
AttributeProducer. I don't think adding this to field makes a lot of sense at 
all and it is not worth the complexity.

bq. Will the Field also control how stored fields are added? Or only 
AttributeSourced ones?
IMO this is only about inverted fields.

bq. We (IW) control the indexing flow, and not the user.
The user only gets the possibility to exchange the analysis chain but not the 
control flow. The user already can mess around with stuff in incrementToken(), 
the only thing we change / invert is that the indexer does not know about 
TokenStreams anymore. it does not change the controlflow though.



> Fully decouple IndexWriter from analyzers
> -----------------------------------------
>
>                 Key: LUCENE-2309
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2309
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>
> IndexWriter only needs an AttributeSource to do indexing.
> Yet, today, it interacts with Field instances, holds a private
> analyzers, invokes analyzer.reusableTokenStream, has to deal with a
> wide variety (it's not analyzed; it is analyzed but it's a Reader,
> String; it's pre-analyzed).
> I'd like to have IW only interact with attr sources that already
> arrived with the fields.  This would be a powerful decoupling -- it
> means others are free to make their own attr sources.
> They need not even use any of Lucene's analysis impls; eg they can
> integrate to other things like [OpenPipeline|http://www.openpipeline.org].
> Or make something completely custom.
> LUCENE-2302 is already a big step towards this: it makes IW agnostic
> about which attr is "the term", and only requires that it provide a
> BytesRef (for flex).
> Then I think LUCENE-2308 would get us most of the remaining way -- ie, if the
> FieldType knows the analyzer to use, then we could simply create a
> getAttrSource() method (say) on it and move all the logic IW has today
> onto there.  (We'd still need existing IW code for back-compat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2309) Fully decouple IndexWriter from analyzers

Reply via email to