jackluo923 opened a new pull request, #12027: URL: https://github.com/apache/pinot/pull/12027
Currently, Pinot hard-coded the Lucene analyzer (`standardAnalyzer`) to tokenize strings for indexing and search. In various scenarios, it is extremely useful to customize the analyzer. There are at least two other users who have requested this feature in https://github.com/apache/pinot/issues/9154. This PR introduces the capability to specify a custom Lucene analyzer used by text index for indexing and search on an individual column basis. Specifically, this PR allows user to specify the FQCN (fully qualified class name) of the Lucene analyzer to use in the text index: ``` fieldConfigList: [ { "name": "columnName", "indexType": "TEXT", "indexTypes": [ "TEXT" ], "properties": { "luceneAnalyzerFQCN": "org.apache.lucene.analysis.core.KeywordAnalyzer" }, } ] ``` **Default Behavior** If user did not specify the `luceneAnalyzerFQCN` property, the behavior is exactly the same as before which is to use the StandardAnalyzer with couple configuration properties. **User Specified Behavior** When user specifies the `luceneAnalyzerFQCN`, the default constructor of the specified Lucene analyzer class is invoked via reflection to create a Lucene analyzer. If user-specified analyzer class does not exist, the ReflectionOperationException is caught and a runtime exception with a more meaningful exception message is thrown. **Testing** This configurable Lucene analyzer feature is currently used in production to index and search large amount of text data on multi-variable text columns using `KeywordAnalyzer`. All existing unit tests with the default behavior are passing. tags: `release-notes`, `enhancement` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
