kasakrisz commented on code in PR #5091:
URL: https://github.com/apache/hive/pull/5091#discussion_r1598392070
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateReduceFunctionsRule.java:
##########
@@ -596,6 +595,42 @@ private RelDataType getFieldType(RelNode relNode, int i) {
return inputField.getType();
}
+ private RexNode getAvgInput(RexNode input, RexBuilder rexBuilder,
RelDataTypeFactory typeFactory) {
+ switch (input.getType().getSqlTypeName()) {
+ case TINYINT:
+ case SMALLINT:
+ case INTEGER:
+ case BIGINT:
+ // These integer types are summed up as long. So, any value is valid
as long
+ return input;
+ case TIMESTAMP:
+ case FLOAT:
+ case DOUBLE:
+ // These float types are summed up as double. FLOAT is compatible with
DOUBLE. Any non-null TIMESTAMP value can
+ // be converted to a non-null DOUBLE value
+ return input;
+ case DECIMAL:
+ // DECIMAL values are evaluated as it is
+ return input;
+ case VARCHAR:
+ case CHAR:
+ // GenericUDAFSum implicitly casts texts into DOUBLE. That conversion
could generate NULL when the text is not a
+ // valid numeric expression.
+ // We have to explicitly cast those types. Otherwise, The COUNT UDF
can evaluate values with inconsistent
+ // semantics.
+ // For example, if a set of values are `"10"`, `"invalid"` and `"20"`,
the average should be (10 + 20) / 2 = 15.
+ // Without excluding invalid numeric texts, the result becomes (10 +
20) / 3 = 10.
+ // Additionally, SUM and COUNT should refer to the same cast
expression so that they can reuse the same column
+ // as their input. Otherwise, we may need double columns of `x` for
SUM and `CAST(x AS DOUBLE)` for COUNT.
+ final RelDataType targetType =
typeFactory.createSqlType(SqlTypeName.DOUBLE);
+ final RelDataType nullableTargetType =
typeFactory.createTypeWithNullability(targetType, true);
+ return rexBuilder.makeCast(nullableTargetType, input);
+ default:
+ // Unsupported types will be validated later. We keep the original
expression here
+ return input;
Review Comment:
How about
```
if (input.getType().getSqlTypeName().getFamily() == SqlTypeFamily.CHARACTER)
{
...
}
```
instead if the switch? Because only character types are casted.
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateReduceFunctionsRule.java:
##########
@@ -612,7 +647,9 @@ private RelDataType getSumReturnType(RelDataTypeFactory
typeFactory,
return TypeConverter.convert(TypeInfoFactory.doubleTypeInfo,
typeFactory);
case DECIMAL:
return typeFactory.getTypeSystem().deriveSumType(typeFactory,
inputType);
+ default:
+ // Unsupported types will be validated later. We keep the original
expression here
Review Comment:
Could you please add where is it validated?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]