okumin commented on code in PR #6389:
URL: https://github.com/apache/hive/pull/6389#discussion_r3044907736
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/udf/GenericUDFIcebergBucket.java:
##########
@@ -209,4 +214,32 @@ public Object evaluate(DeferredObject[] arguments) throws
HiveException {
public String getDisplayString(String[] children) {
return getStandardDisplayString("iceberg_bucket", children);
}
+
+ @Override
+ public StatEstimator getStatEstimator() {
+ return new BucketStatEstimator();
Review Comment:
I would simply use `numBuckets` as `return new BucketStatEstimator()`. But
the current approach also works correctly. I would not mind if you would like
to keep the current one.
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/udf/GenericUDFIcebergBucket.java:
##########
@@ -209,4 +214,32 @@ public Object evaluate(DeferredObject[] arguments) throws
HiveException {
public String getDisplayString(String[] children) {
return getStandardDisplayString("iceberg_bucket", children);
}
+
+ @Override
+ public StatEstimator getStatEstimator() {
+ return new BucketStatEstimator();
+ }
+
+ private static class BucketStatEstimator implements StatEstimator {
+ @Override
+ public Optional<ColStatistics> estimate(List<ColStatistics> argStats) {
+ if (argStats.size() != 2) {
+ return Optional.empty();
+ }
+ ColStatistics inputStats = argStats.get(0);
+ ColStatistics bucketCountStats = argStats.get(1);
+ ColStatistics.Range bucketRange = bucketCountStats.getRange();
+ if (bucketRange == null || bucketRange.minValue == null) {
+ return Optional.empty();
+ }
+ long numBuckets = bucketRange.minValue.longValue();
+ if (numBuckets <= 0) {
+ return Optional.empty();
+ }
+ ColStatistics result = inputStats.clone();
Review Comment:
I guess we shouldn't inherit all the stats from the first argument. We can
create an empty ColStatics and set `countDistinct`, `numNulls`, `range`, and
`isEstimated`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]