siddharthteotia commented on a change in pull request #6869:
URL: https://github.com/apache/incubator-pinot/pull/6869#discussion_r630472423



##########
File path: 
pinot-controller/src/main/java/org/apache/pinot/controller/recommender/io/InputManager.java
##########
@@ -149,24 +146,33 @@ public void init()
     reorderDimsAndBuildMap();
     registerColNameFieldType();
     validateQueries();
-    if (_useCardinalityNormalization) {
-      regulateCardinalityForAll();
-    }
   }
 
-  private void regulateCardinalityForAll() {
-    double sampleSize;
-    if (getTableType().equalsIgnoreCase(REALTIME)) {
-      sampleSize = getSegmentFlushTime() * getNumMessagesPerSecInKafkaTopic();
-    } else {
-      sampleSize = getNumRecordsPerPush();
-    }
-
+  /**
+   * Cardinalities provided by users are relative to number of records per 
push, but we might end up creating multiple
+   * segments for each push. Using this methods, cardinalities will be capped 
by the provided number of rows in segment.
+   */
+  public void capCardinalities(int numRecordsInSegment) {
     _metaDataMap.keySet().forEach(colName -> {
-      int cardinality = _metaDataMap.get(colName).getCardinality();
-      double regulatedCardinality = 
regulateCardinalityInfinitePopulation(cardinality, sampleSize);
-      _metaDataMap.get(colName).setCardinality((int) 
Math.round(regulatedCardinality));
+      int cardinality = Math.min(numRecordsInSegment, 
_metaDataMap.get(colName).getCardinality());
+      _metaDataMap.get(colName).setCardinality(cardinality);
     });
+    if (_schemaWithMetaData.getDimensionFieldSpecs() != null) {
+      _schemaWithMetaData.getDimensionFieldSpecs()

Review comment:
       (nit) we don't need to do Math.min() again right ? Already have the 
cardinality at line 157
   
   Also the code from line 160 to 175 is probably redundant ? `_metadataMap` 
where you are already setting the updated cardinality is essentially setting it 
in FieldMetadata for each of the dimension, metric, timeCol and DateTimeCols. 
Lines 160 to 175 are also setting it in FieldMetadata. So I don't think it is 
needed. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to