[jira] [Comment Edited] (LUCENE-7318) Graduate StandardAnalyzer out of analyzers module into core

Uwe Schindler (JIRA) Sun, 11 Sep 2016 08:18:55 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15481905#comment-15481905
 ]


Uwe Schindler edited comment on LUCENE-7318 at 9/11/16 3:17 PM:
----------------------------------------------------------------

bq. Hmm, why not leave StopFilter, etc., in core, and put (deprecated) 
subclasses in the old package names?

I plan to do this for 6.x and 6.2.1, but I won't deprecate the duplicates for 
now. So I will just subclass in analyzers/common, although this is still a lot 
of code duplication (most classes only have ctors that need to be cloned). This 
would also make it consistent with the Factory classes. Those factory classes 
should use/instantiate the common variants.

All other discussion should be placed in LUCENE-7444. Once this is discussed 
and finalized, we can decide in 6.3, which classes to deprecate (if we do this 
at all). My personal opinion is:
- Move StandardTokenizer to core (no package name change, so no backwards layer 
needed)
- Move no-op StandardFilter to core, too, but deprecate from beginning (no 
package name change, so no backwards layer needed)
- Add all "original" classes back in analyzers/common by subclassing, but don't 
deprecate

Later-on (LUCENE-7444):
- Remove StopFilter. For first time users, the decision of Stop words or not 
should be simple and our recommendation: no stop words please for something 
thats called "Standard"
- StopFilter and all its superclasses and utility classes move back into 
analysis/common. I'd also suggest this for LowercaseFilter and just clone it in 
core as a package-private class inside oal/analysis/standard.
- The CharacterUtils can stay in core (its @lucene.internal anyways), but moved 
completely to utils package (I have no strong opinion there)

People that want to have stopwords can always define their own Analyzer using 
CustomAnalyzer.


was (Author: thetaphi):
bq. Hmm, why not leave StopFilter, etc., in core, and put (deprecated) 
subclasses in the old package names?

I plan to do this for 6.x and 6.2.1, but I won't deprecate the duplicates for 
now. So I will just subclass in analyzers/common, although this is still a lot 
of code duplication (most classes only have ctors that need to be cloned).

All other discussion should be placed in LUCENE-7444. Once this is discussed 
and finalized, we can decide in 6.3, which classes to deprecate (if we do this 
at all). My personal opinion is:
- Move StandardTokenizer to core (no package name change, so no backwards layer 
needed)
- Move no-op StandardFilter to core, too, but deprecate from beginning (no 
package name change, so no backwards layer needed)
- Add all "original" classes back in analyzers/common by subclassing, but don't 
deprecate

Later-on (LUCENE-7444):
- Remove StopFilter. For first time users, the decision of Stop words or not 
should be simple and our recommendation: no stop words please for something 
thats called "Standard"
- StopFilter and all its superclasses and utility classes move back into 
analysis/common. I'd also suggest this for LowercaseFilter and just clone it in 
core as a package-private class inside oal/analysis/standard.
- The CharacterUtils can stay in core (its @lucene.internal anyways), but moved 
completely to utils package (I have no strong opinion there)

People that want to have stopwords can always define their own Analyzer using 
CustomAnalyzer.

> Graduate StandardAnalyzer out of analyzers module into core
> -----------------------------------------------------------
>
>                 Key: LUCENE-7318
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7318
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Blocker
>             Fix For: master (7.0), 6.2, 6.2.1
>
>         Attachments: LUCENE-7318.patch
>
>
> Spinoff from LUCENE-7314:
> {{StandardAnalyzer}} has progressed substantially since we broke out the 
> analyzers module ... it now follows a real Unicode standard (UAX #29 Unicode 
> Text Segmentation).  It's also much faster than it used to be, since it 
> switched to JFlex a while back.  Many bug fixes, etc.
> I think it would make a good default for most Lucene users, and we should 
> graduate it from the analyzers module into core, and make it the default for 
> {{IndexWriter}}.
> It's really quite crazy that users must go digging in the analyzers module to 
> get started with Lucene ... we don't make them dig through the codecs module 
> to find a good default codec ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-7318) Graduate StandardAnalyzer out of analyzers module into core

Reply via email to