[jira] Commented: (LUCENE-2309) Fully decouple IndexWriter from analyzers

Michael McCandless (JIRA) Thu, 11 Mar 2010 11:20:52 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844177#action_12844177
 ]


Michael McCandless commented on LUCENE-2309:
--------------------------------------------

bq. Would this mean that after that we can move all of core Analyzers to 
contrib/analyzers

Yes, though, I think that's orthogonal (can and should be separately
done, anyway).

bq. making one step towards getting them completely out of Lucene and into 
their own Apache project?

We may simply "standardize" on contrib/analyzers as the one place,
instead of a new [sub-]project.  To be discussed... but we really do
need one place.

bq. That way, we can keep in core only the AttributeSource and accompanying 
classes, and really allow people to pass AttributeSource which is not even an 
Analyzer (like you said).  We can move the specific Analyzer tests to 
contrib/analyzers as well. The other tests in core, who don't care about 
analysis, can use a src/test specific AttributeSource, like 
TestAttributeSourceImpl ...

Right.

bq. I'm thinking - it's ok for contrib to depend on core but not the other way 
around.

I agree.

bq. It will however take out of core a useful feature for new users which 
allows fast bootstrap.

Well.. I suspect with this change users would not typically use
lucene-core alone.  Ie, they'd get analyzers and queryparser (if we
also move it out as its own module).

bq. That won't be the case when analyzers move out of Lucene entirely, but 
while they are in Lucene, we'll force everyone to download contrib/analyzers as 
well.

I think a single source for all analyzers will be a great step
forwards for users.

bq. So maybe we keep in core only Standard, or maybe even something simpler, 
again, for easy bootstrapping (like Whitespace + lowercase).

Or remove them entirely (but, then, core tests will need to use
contrib analyzers for their testing)...


> Fully decouple IndexWriter from analyzers
> -----------------------------------------
>
>                 Key: LUCENE-2309
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2309
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>
> IndexWriter only needs an AttributeSource to do indexing.
> Yet, today, it interacts with Field instances, holds a private
> analyzers, invokes analyzer.reusableTokenStream, has to deal with a
> wide variety (it's not analyzed; it is analyzed but it's a Reader,
> String; it's pre-analyzed).
> I'd like to have IW only interact with attr sources that already
> arrived with the fields.  This would be a powerful decoupling -- it
> means others are free to make their own attr sources.
> They need not even use any of Lucene's analysis impls; eg they can
> integrate to other things like [OpenPipeline|http://www.openpipeline.org].
> Or make something completely custom.
> LUCENE-2302 is already a big step towards this: it makes IW agnostic
> about which attr is "the term", and only requires that it provide a
> BytesRef (for flex).
> Then I think LUCENE-2308 would get us most of the remaining way -- ie, if the
> FieldType knows the analyzer to use, then we could simply create a
> getAttrSource() method (say) on it and move all the logic IW has today
> onto there.  (We'd still need existing IW code for back-compat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2309) Fully decouple IndexWriter from analyzers

Reply via email to