Jackie-Jiang commented on code in PR #12027:
URL: https://github.com/apache/pinot/pull/12027#discussion_r1408514071


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/TextIndexConfig.java:
##########
@@ -57,7 +59,9 @@ public TextIndexConfig(
       @JsonProperty("stopWordsInclude") List<String> stopWordsInclude,
       @JsonProperty("stopWordsExclude") List<String> stopWordsExclude,
       @JsonProperty("luceneUseCompoundFile") Boolean luceneUseCompoundFile,
-      @JsonProperty("luceneMaxBufferSizeMB") Integer luceneMaxBufferSizeMB) {
+      @JsonProperty("luceneMaxBufferSizeMB") Integer luceneMaxBufferSizeMB,
+      @JsonProperty("luceneAnalyzerFQCN") String luceneAnalyzerFQCN

Review Comment:
   Suggest renaming it to `luceneAnalyzerClass` to match the other similar 
configs



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneTextIndex.java:
##########
@@ -62,9 +60,12 @@ public class RealtimeLuceneTextIndex implements 
MutableTextIndex {
    * @param segmentName realtime segment name
    * @param stopWordsInclude the words to include in addition to the default 
stop word list
    * @param stopWordsExclude stop words to exclude from default stop words
+   * @param maxBufferSizeMB maximum size of the Lucene index buffer
+   * @param luceneAnalyzerFQCN fully qualified class name of the Lucene 
analyzer used for tokenization
    */
   public RealtimeLuceneTextIndex(String column, File segmentIndexDir, String 
segmentName,
-      List<String> stopWordsInclude, List<String> stopWordsExclude, boolean 
useCompoundFile, int maxBufferSizeMB) {
+      List<String> stopWordsInclude, List<String> stopWordsExclude, boolean 
useCompoundFile, int maxBufferSizeMB,
+      String luceneAnalyzerFQCN) {

Review Comment:
   Shall we directly pass in the `TextIndexConfig` here instead of keeping 
adding extra parameters?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneTextIndex.java:
##########
@@ -120,7 +122,8 @@ public MutableRoaringBitmap getDocIds(String searchQuery) {
     Callable<MutableRoaringBitmap> searchCallable = () -> {
       IndexSearcher indexSearcher = null;
       try {
-        Query query = new QueryParser(_column, _analyzer).parse(searchQuery);
+        Query query =
+            new QueryParser(_column, 
_indexCreator.getIndexWriter().getConfig().getAnalyzer()).parse(searchQuery);

Review Comment:
   (minor) Shall we cache the `_analyzer` after creating the `IndexWriter`?



##########
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java:
##########
@@ -51,6 +51,9 @@ public class FieldConfig extends BaseJsonConfig {
   public static final String TEXT_INDEX_STOP_WORD_EXCLUDE_KEY = 
"stopWordExclude";
   public static final String TEXT_INDEX_LUCENE_USE_COMPOUND_FILE = 
"luceneUseCompoundFile";
   public static final String TEXT_INDEX_LUCENE_MAX_BUFFER_SIZE_MB = 
"luceneMaxBufferSizeMB";
+  public static final String TEXT_INDEX_LUCENE_ANALYZER_FQCN = 
"luceneAnalyzerFQCN";
+  public static final String TEXT_INDEX_DEFAULT_LUCENE_ANALYZER_FQCN
+          = "org.apache.lucene.analysis.standard.StandardAnalyzer";

Review Comment:
   ```suggestion
     public static final String TEXT_INDEX_LUCENE_ANALYZER_CLASS = 
"luceneAnalyzerClass";
     public static final String DEFAULT_TEXT_INDEX_LUCENE_ANALYZER_CLASS
             = "org.apache.lucene.analysis.standard.StandardAnalyzer";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to