[ 
https://issues.apache.org/jira/browse/LUCENE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844528#action_12844528
 ] 

Uwe Schindler commented on LUCENE-2309:
---------------------------------------

There is one problem that cannot be easy solved (for all proposals here), if we 
want to provide an old-style API that does not require reuse of tokens:
The problem with AttributeProvider is that if we want to support something 
(like rmuir proposed before) that looks like the old "Token next()", we need an 
AttributeProvider that passes the AttributeSource to the indexer on each Token! 
And that would lead to lots of getAttribute() calls, that would slowdown 
indexing! So with the current APIs we cannot get around the requirement to 
reuse the same Attribute instances during the whole indexing without a major 
speed impact. This can only be solved with my nice BCEL proxy Attributes, so 
you can exchange the inner attribute impl. Or do it like TokenWrapper in 2.9 
(yes, we can reactivate that API somehow as an easy use-addendum).

> Fully decouple IndexWriter from analyzers
> -----------------------------------------
>
>                 Key: LUCENE-2309
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2309
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>
> IndexWriter only needs an AttributeSource to do indexing.
> Yet, today, it interacts with Field instances, holds a private
> analyzers, invokes analyzer.reusableTokenStream, has to deal with a
> wide variety (it's not analyzed; it is analyzed but it's a Reader,
> String; it's pre-analyzed).
> I'd like to have IW only interact with attr sources that already
> arrived with the fields.  This would be a powerful decoupling -- it
> means others are free to make their own attr sources.
> They need not even use any of Lucene's analysis impls; eg they can
> integrate to other things like [OpenPipeline|http://www.openpipeline.org].
> Or make something completely custom.
> LUCENE-2302 is already a big step towards this: it makes IW agnostic
> about which attr is "the term", and only requires that it provide a
> BytesRef (for flex).
> Then I think LUCENE-2308 would get us most of the remaining way -- ie, if the
> FieldType knows the analyzer to use, then we could simply create a
> getAttrSource() method (say) on it and move all the logic IW has today
> onto there.  (We'd still need existing IW code for back-compat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to