codope commented on code in PR #5224:
URL: https://github.com/apache/hudi/pull/5224#discussion_r842353165
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala:
##########
@@ -224,7 +228,8 @@ object ColumnStatsIndexSupport {
private val COLUMN_STATS_INDEX_FILE_COLUMN_NAME = "fileName"
private val COLUMN_STATS_INDEX_MIN_VALUE_STAT_NAME = "minValue"
private val COLUMN_STATS_INDEX_MAX_VALUE_STAT_NAME = "maxValue"
- private val COLUMN_STATS_INDEX_NUM_NULLS_STAT_NAME = "num_nulls"
+ private val COLUMN_STATS_INDEX_NULL_COUNT_STAT_NAME = "nullCount"
Review Comment:
Can we not reuse HoodieMetadataPayload.* constants here as well?
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/DataSkippingUtils.scala:
##########
@@ -211,10 +211,10 @@ object DataSkippingUtils extends Logging {
.map(colName => GreaterThan(genColNumNullsExpr(colName), Literal(0)))
// Filter "colA is not null"
- // Translates to "colA_nullCount = 0" for index lookup
+ // Translates to "colA_nullCount < colA_valueCount" for index lookup
case IsNotNull(attribute: AttributeReference) =>
getTargetIndexedColumnName(attribute, indexSchema)
- .map(colName => EqualTo(genColNumNullsExpr(colName), Literal(0)))
+ .map(colName => LessThan(genColNumNullsExpr(colName),
genColValueCountExpr))
Review Comment:
Filter `colA is not null` is the complement to `colA is null` then why the
two have different translation (one has to depend on the valueCount while the
other depends on Literal(0))?
I mean if `colA is null` is translated to
`GreaterThan(genColNumNullsExpr(colName), Literal(0))`, then shouldn't `colA is
not null` be translated to `LessThanOrEqual(genColNumNullsExpr(colName),
Literal(0))`?
Or if you say that `colA is not null` should be translated to
`LessThan(genColNumNullsExpr(colName), genColValueCountExpr)`, then shouldn't
`colA is null` be translated to
`GreaterThanOrEqual(genColNumNullsExpr(colName), genColValueCountExpr)`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]