[ 
https://issues.apache.org/jira/browse/LUCENE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082530#comment-17082530
 ] 

Uwe Schindler commented on LUCENE-9317:
---------------------------------------

Hi,

The items discussed here ws one of the reasons, why I gave up on cleaning this 
up, because it needs feedback from community. On my side, I have many users I 
help with integrating Lucene/Solr with a lot of additional analyzer components 
(handwritten), used in Solr. Those use partially paid libraries for text 
analysis. Some of them are written just for backwards compatibility (they still 
have indexes inherited from Lucene 3.x  or even 2 and they can't reindex. I 
wrote code for them to migrate those, automatically adding numeric fields and 
reently also point fields, based on indexed string terms! So there is a huge 
community that don't use Analyzers out of box, but they have their own 
(especially people working on patent search, e-mail search.

{quote}
While breaking changes in public APIs are certainly okay for a major version 
upgrade and this is a good clean-up, it might have a major impact on 
applications (though I don't use this no-arg default constructor) ?
 As a possible direction, instead of removing o.a.l.a.standard from "core" we 
could take inverted strategy - move o.a.l.a.standard.StandardTokenizerFactory 
to "core" from "analyzers-common" and rename the "o.a.l.a.standard" package in 
"analyzers-common" (for example, "o.a.l.a.classic"). It should be possible if 
we also move analysis factory bases and the SPILoader staff to "core". From my 
personal point of view, it would be a good idea that we have a default, 
concrete Analyzer/Tokenizer in "core" for both of convenience and reference 
implementation. But I may be wrong and would like to hear thoughts / opinions 
from others.
{quote}

That's what I figured out already and what I suggested in my original mail. I 
tend to move the factory classes to core. But ths opens a can of worms: When we 
ported over the factories from solr to Lucene, we just made them "optional" 
(and people still don't use them in combination with CustomAnalyzer), so the 
base class went to the totally stupid package name "oal.analysis.utils". But 
since CustomAnalyzer is now the "recommended way" to have your custom analysis 
and we no longer want people to hack Analyzer subclasses together, the 
factories are getting more an more important, also for non-Solr users.

If we move the analysis factories to core we have another problem: split 
package on the utils! Most of the stuff there is really not required for core, 
it's only for usage of analysis-common package. IMHO, the factories should be 
next to Tokenizer, TokenFilter and CharFilter in "oal.analysis"! But this 
package move has an immense effect on users:
- All META-INF/services files have to be renamed, also people providing 
analysis factories in own packages need to refactor their source tree structure 
and change their build systems.
- As basically everybody using custom analysis has subclasses one of the 
factoriy classes, they also need to change their code. With Lucene 9, they have 
to do this anyways (caused by the NAME field and also the new default 
constructor just throing UOE which is required,

So before 9.0 this is the last chance to do this, we can't do this in a minor 
release. And we need to extend MIGRATE.md file with detailed instructions how 
to refactor your code. Remember : One of the most often extended and customized 
parts of Lucene are analysis, so there are factory JAR files using SPI 
everywhere! The whole thing needs to be well-thought! This is NOT about 
modules, it's just showing the problem with package structure!

One reason why the factories should move to core is that once we did this, one 
no longer need to depend on analyzers-common anymore. If he has a set of 
factories and tokenizers/filters and otherwise only requires the default ones, 
he can completely remove the huge common.jar file! Also public and commonly 
used abstract base classes should not be part of an optional module!

bq. and rename the "o.a.l.a.standard" package in "analyzers-common" (for 
example, "o.a.l.a.classic"). 

There's also the UAX analyzer for domain names. We may need to move it to a 
different subpackage, too.

> Resolve package name conflicts for StandardAnalyzer to allow Java module 
> system support
> ---------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9317
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9317
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>    Affects Versions: master (9.0)
>            Reporter: David Ryan
>            Priority: Major
>              Labels: build, features
>
>  
> To allow Lucene to be modularised there are a few preparatory tasks to be 
> completed prior to this being possible.  The Java module system requires that 
> jars do not use the same package name in different jars.  The lucene-core and 
> lucene-analyzers-common both share the package 
> org.apache.lucene.analysis.standard.
> Possible resolutions to this issue are discussed by Uwe on the mailing list 
> here:
>  
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAM21Rt8FHOq_JeUSELhsQJH0uN0eKBgduBQX4fQKxbs49TLqzA%40mail.gmail.com%3E]????
> {quote}About StandardAnalyzer: Unfortunately I aggressively complained a 
> while back when Mike McCandless wanted to move standard analyzer out of the 
> analysis package into core (“for convenience”). This was a bad step, and IMHO 
> we should revert that or completely rename the packages and everything. The 
> problem here is: As the analysis services are only part of lucene-analyzers, 
> we had to leave the factory classes there, but move the implementation 
> classes in core. The package has to be the same. The only way around that is 
> to move the analysis factory framework also to core (I would not be against 
> that). This would include all factory base classes and the service loading 
> stuff. Then we can move standard analyzer and some of the filters/tokenizers 
> including their factories to core an that problem would be solved.
> {quote}
> There are two options here, either move factory framework into core or revert 
> StandardAnalyzer back to lucene-analyzers.  In the email, the solution lands 
> on reverting back as per the task list:
> {quote}Add some preparatory issues to cleanup class hierarchy: Move Analysis 
> SPI to core / remove StandardAnalyzer and related classes out of core back to 
> anaysis
> {quote}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to