siddharthteotia commented on a change in pull request #6869: URL: https://github.com/apache/incubator-pinot/pull/6869#discussion_r630472423
########## File path: pinot-controller/src/main/java/org/apache/pinot/controller/recommender/io/InputManager.java ########## @@ -149,24 +146,33 @@ public void init() reorderDimsAndBuildMap(); registerColNameFieldType(); validateQueries(); - if (_useCardinalityNormalization) { - regulateCardinalityForAll(); - } } - private void regulateCardinalityForAll() { - double sampleSize; - if (getTableType().equalsIgnoreCase(REALTIME)) { - sampleSize = getSegmentFlushTime() * getNumMessagesPerSecInKafkaTopic(); - } else { - sampleSize = getNumRecordsPerPush(); - } - + /** + * Cardinalities provided by users are relative to number of records per push, but we might end up creating multiple + * segments for each push. Using this methods, cardinalities will be capped by the provided number of rows in segment. + */ + public void capCardinalities(int numRecordsInSegment) { _metaDataMap.keySet().forEach(colName -> { - int cardinality = _metaDataMap.get(colName).getCardinality(); - double regulatedCardinality = regulateCardinalityInfinitePopulation(cardinality, sampleSize); - _metaDataMap.get(colName).setCardinality((int) Math.round(regulatedCardinality)); + int cardinality = Math.min(numRecordsInSegment, _metaDataMap.get(colName).getCardinality()); + _metaDataMap.get(colName).setCardinality(cardinality); }); + if (_schemaWithMetaData.getDimensionFieldSpecs() != null) { + _schemaWithMetaData.getDimensionFieldSpecs() Review comment: (nit) we don't need to do Math.min() again right ? Already have the cardinality at line 157 Also the code from line 160 to 175 is probably redundant ? `_metadataMap` where you are already setting the updated cardinality is essentially setting it in FieldMetadata for each of the dimension, metric, timeCol and DateTimeCols. Lines 160 to 175 are also setting it in FieldMetadata. So I don't think it is needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org