[GitHub] jena issue #246: Generic text analyzers

osma Wed, 28 Jun 2017 03:53:47 -0700

Github user osma commented on the issue:

    https://github.com/apache/jena/pull/246
  
    The new features introduced by this PR seem useful and worthwhile, and the 
implementation looks solid. This clearly doesn't conflict with existing code. 
    
    However, there is some overlap in functionality to earlier attempts at 
making the analyzer configuration more generic. For example, I think that the 
equivalent of LowerCaseKeywordAnalyzer can now be specified in three different 
ways: using LowerCaseKeywordAnalyzer directly; using ConfigurableAnalyzer to 
build it from KeywordTokenizer and two filters; or with the new 
GenericAnalyzer. If we had had functionality like this earlier, then there 
wouldn't have been any need to create ConfigurableAnalyzer or indeed the 
pre-packaged LowerCaseKeywordAnalyzer. 
    
    But now I think we must simply support all the different configuration 
styles. If we want to drop one or more of those options, it should at least be 
clearly deprecated well in advance. It is not the fault of this PR, just 
something that has happened as the feature set has grown organically over time.
    
    +1 for merging after the minor issues identified by @afs have been sorted 
out.
    
    I must say that without the excellent documentation updates that come along 
with this PR, I would be a lot more hesitant to merge this because of the added 
complexity in index configuration. But with the very clear examples and 
thorough explanations (also for older jena-text features!), I think this adds a 
lot of value, even for jena-text users who don't need the full power of this 
meta-configuration language.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] jena issue #246: Generic text analyzers

Reply via email to