[ https://issues.apache.org/jira/browse/LUCENE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082530#comment-17082530 ]
Uwe Schindler commented on LUCENE-9317: --------------------------------------- Hi, The items discussed here ws one of the reasons, why I gave up on cleaning this up, because it needs feedback from community. On my side, I have many users I help with integrating Lucene/Solr with a lot of additional analyzer components (handwritten), used in Solr. Those use partially paid libraries for text analysis. Some of them are written just for backwards compatibility (they still have indexes inherited from Lucene 3.x or even 2 and they can't reindex. I wrote code for them to migrate those, automatically adding numeric fields and reently also point fields, based on indexed string terms! So there is a huge community that don't use Analyzers out of box, but they have their own (especially people working on patent search, e-mail search. {quote} While breaking changes in public APIs are certainly okay for a major version upgrade and this is a good clean-up, it might have a major impact on applications (though I don't use this no-arg default constructor) ? As a possible direction, instead of removing o.a.l.a.standard from "core" we could take inverted strategy - move o.a.l.a.standard.StandardTokenizerFactory to "core" from "analyzers-common" and rename the "o.a.l.a.standard" package in "analyzers-common" (for example, "o.a.l.a.classic"). It should be possible if we also move analysis factory bases and the SPILoader staff to "core". From my personal point of view, it would be a good idea that we have a default, concrete Analyzer/Tokenizer in "core" for both of convenience and reference implementation. But I may be wrong and would like to hear thoughts / opinions from others. {quote} That's what I figured out already and what I suggested in my original mail. I tend to move the factory classes to core. But ths opens a can of worms: When we ported over the factories from solr to Lucene, we just made them "optional" (and people still don't use them in combination with CustomAnalyzer), so the base class went to the totally stupid package name "oal.analysis.utils". But since CustomAnalyzer is now the "recommended way" to have your custom analysis and we no longer want people to hack Analyzer subclasses together, the factories are getting more an more important, also for non-Solr users. If we move the analysis factories to core we have another problem: split package on the utils! Most of the stuff there is really not required for core, it's only for usage of analysis-common package. IMHO, the factories should be next to Tokenizer, TokenFilter and CharFilter in "oal.analysis"! But this package move has an immense effect on users: - All META-INF/services files have to be renamed, also people providing analysis factories in own packages need to refactor their source tree structure and change their build systems. - As basically everybody using custom analysis has subclasses one of the factoriy classes, they also need to change their code. With Lucene 9, they have to do this anyways (caused by the NAME field and also the new default constructor just throing UOE which is required, So before 9.0 this is the last chance to do this, we can't do this in a minor release. And we need to extend MIGRATE.md file with detailed instructions how to refactor your code. Remember : One of the most often extended and customized parts of Lucene are analysis, so there are factory JAR files using SPI everywhere! The whole thing needs to be well-thought! This is NOT about modules, it's just showing the problem with package structure! One reason why the factories should move to core is that once we did this, one no longer need to depend on analyzers-common anymore. If he has a set of factories and tokenizers/filters and otherwise only requires the default ones, he can completely remove the huge common.jar file! Also public and commonly used abstract base classes should not be part of an optional module! bq. and rename the "o.a.l.a.standard" package in "analyzers-common" (for example, "o.a.l.a.classic"). There's also the UAX analyzer for domain names. We may need to move it to a different subpackage, too. > Resolve package name conflicts for StandardAnalyzer to allow Java module > system support > --------------------------------------------------------------------------------------- > > Key: LUCENE-9317 > URL: https://issues.apache.org/jira/browse/LUCENE-9317 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other > Affects Versions: master (9.0) > Reporter: David Ryan > Priority: Major > Labels: build, features > > > To allow Lucene to be modularised there are a few preparatory tasks to be > completed prior to this being possible. The Java module system requires that > jars do not use the same package name in different jars. The lucene-core and > lucene-analyzers-common both share the package > org.apache.lucene.analysis.standard. > Possible resolutions to this issue are discussed by Uwe on the mailing list > here: > > [http://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAM21Rt8FHOq_JeUSELhsQJH0uN0eKBgduBQX4fQKxbs49TLqzA%40mail.gmail.com%3E]???? > {quote}About StandardAnalyzer: Unfortunately I aggressively complained a > while back when Mike McCandless wanted to move standard analyzer out of the > analysis package into core (“for convenience”). This was a bad step, and IMHO > we should revert that or completely rename the packages and everything. The > problem here is: As the analysis services are only part of lucene-analyzers, > we had to leave the factory classes there, but move the implementation > classes in core. The package has to be the same. The only way around that is > to move the analysis factory framework also to core (I would not be against > that). This would include all factory base classes and the service loading > stuff. Then we can move standard analyzer and some of the filters/tokenizers > including their factories to core an that problem would be solved. > {quote} > There are two options here, either move factory framework into core or revert > StandardAnalyzer back to lucene-analyzers. In the email, the solution lands > on reverting back as per the task list: > {quote}Add some preparatory issues to cleanup class hierarchy: Move Analysis > SPI to core / remove StandardAnalyzer and related classes out of core back to > anaysis > {quote} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org