jackluo923 opened a new pull request, #12027:
URL: https://github.com/apache/pinot/pull/12027

   Currently, Pinot hard-coded the Lucene analyzer (`standardAnalyzer`) to 
tokenize strings for indexing and search. In various scenarios, it is extremely 
useful to customize the analyzer. There are at least two other users who have 
requested this feature in https://github.com/apache/pinot/issues/9154. 
   
   This PR introduces the capability to specify a custom Lucene analyzer used 
by text index for indexing and search on an individual column basis. 
Specifically, this PR allows user to specify the FQCN (fully qualified class 
name) of the Lucene analyzer to use in the text index:
   ```
   fieldConfigList: [
      {
           "name": "columnName",
           "indexType": "TEXT",
           "indexTypes": [
             "TEXT"
           ],
           "properties": {
             "luceneAnalyzerFQCN": 
"org.apache.lucene.analysis.core.KeywordAnalyzer"
           },
         }
     ]
     ```    
   
   **Default Behavior**
   If user did not specify the `luceneAnalyzerFQCN` property, the behavior is 
exactly the same as before which is to use the StandardAnalyzer with couple 
configuration properties.
   
   **User Specified Behavior**
   When user specifies the `luceneAnalyzerFQCN`, the default constructor of the 
specified Lucene analyzer class is invoked via reflection to create a Lucene 
analyzer. If user-specified analyzer class does not exist, the 
ReflectionOperationException is caught and a runtime exception with a more 
meaningful exception message is thrown.
   
   **Testing**
   This configurable Lucene analyzer feature is currently used in production to 
index and search large amount of text data on multi-variable text columns using 
`KeywordAnalyzer`. All existing unit tests with the default behavior are 
passing.
   
   tags: `release-notes`, `enhancement`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to