gortiz commented on code in PR #10380:
URL: https://github.com/apache/pinot/pull/10380#discussion_r1143222905
##########
pinot-common/src/main/java/org/apache/pinot/common/request/context/LiteralContext.java:
##########
@@ -37,68 +45,116 @@ public class LiteralContext {
private FieldSpec.DataType _type;
private Object _value;
- // TODO: Support all data types.
- private static FieldSpec.DataType
convertThriftTypeToDataType(Literal._Fields fields) {
- switch (fields) {
- case LONG_VALUE:
- return FieldSpec.DataType.LONG;
- case BOOL_VALUE:
- return FieldSpec.DataType.BOOLEAN;
- case DOUBLE_VALUE:
- return FieldSpec.DataType.DOUBLE;
- case STRING_VALUE:
- return FieldSpec.DataType.STRING;
+ private BigDecimal _bigDecimalValue;
+
+ private static BigDecimal getBigDecimalValue(FieldSpec.DataType type, Object
value) {
+ switch (type){
+ case BIG_DECIMAL:
+ return (BigDecimal) value;
+ case BOOLEAN:
+ return PinotDataType.BOOLEAN.toBigDecimal(value);
+ case TIMESTAMP:
+ return
PinotDataType.TIMESTAMP.toBigDecimal(Timestamp.valueOf(value.toString()));
default:
- throw new UnsupportedOperationException("Unsupported literal type:" +
fields);
+ if(type.isNumeric()){
+ return new BigDecimal(value.toString());
+ }
+ return BigDecimal.ZERO;
}
}
- private static Class<?> convertDataTypeToJavaType(FieldSpec.DataType
dataType) {
- switch (dataType) {
- case INT:
- return Integer.class;
- case LONG:
- return Long.class;
- case BOOLEAN:
- return Boolean.class;
- case FLOAT:
- return Float.class;
- case DOUBLE:
- return Double.class;
- case STRING:
- return String.class;
- default:
- throw new UnsupportedOperationException("Unsupported dataType:" +
dataType);
+ @VisibleForTesting
+ static Pair<FieldSpec.DataType, Object> inferLiteralDataTypeAndValue(String
literal) {
+ // Try to interpret the literal as number
+ try {
+ Number number = NumberUtils.createNumber(literal);
+ if (number instanceof BigDecimal || number instanceof BigInteger) {
+ return ImmutablePair.of(FieldSpec.DataType.BIG_DECIMAL, new
BigDecimal(literal));
+ } else {
+ return ImmutablePair.of(FieldSpec.DataType.STRING, literal);
+ }
+ } catch (Exception e) {
+ // Ignored
+ }
+
+ // Try to interpret the literal as TIMESTAMP
+ try {
+ Timestamp timestamp = Timestamp.valueOf(literal);
+ return ImmutablePair.of(FieldSpec.DataType.TIMESTAMP, timestamp);
+ } catch (Exception e) {
+ // Ignored
}
+ return ImmutablePair.of(FieldSpec.DataType.STRING, literal);
}
public LiteralContext(Literal literal) {
- _type = convertThriftTypeToDataType(literal.getSetField());
- _value = literal.getFieldValue();
+ Preconditions.checkState(literal.getFieldValue() != null,
+ "Field value cannot be null for field:" + literal.getSetField());
+ switch (literal.getSetField()){
+ case BOOL_VALUE:
+ _type = FieldSpec.DataType.BOOLEAN;
+ _value = literal.getFieldValue();
+ break;
+ case DOUBLE_VALUE:
+ _type = FieldSpec.DataType.DOUBLE;
+ _value = literal.getFieldValue();
+ break;
+ case LONG_VALUE:
+ _type = FieldSpec.DataType.LONG;
+ _value = literal.getFieldValue();
+ break;
+ case NULL_VALUE:
+ _type = FieldSpec.DataType.UNKNOWN;
+ _value = null;
+ break;
+ case STRING_VALUE:
+ Pair<FieldSpec.DataType, Object> typeAndValue =
inferLiteralDataTypeAndValue(literal.getFieldValue().toString());
+ _type = typeAndValue.getLeft();
+ _value = typeAndValue.getRight();
+ break;
+ default:
+ throw new UnsupportedOperationException("Unsupported data type:" +
literal.getSetField());
+ }
+ _bigDecimalValue = getBigDecimalValue(_type, _value);
Review Comment:
It is not clear to me what is the difference between `_value` and
`_bigDecimalValue` in each case.
* If literal.getSetField is `DOUBLE_VALUE` or `LONG_VALUE`, `_value` is
either a `Double` or a `Long`.
* If literal.getSetField is `STRING_VALUE`:
* And the literal is an actual number (like `"123"` or `"1.3"`), then
`_value` is the more specific implementation `Number` possible.
* And the literal is not an actual number:
* If it is a boolean, `_value` is a String like `"true"` or
`"false"` and `_bigDecimanValue` is `1` or `0`.
* If it is a timestamp, `_value` is a String (why?) like
"21312312" and `_bigDecimanValue` is the representation as a BigDecimal.
Then `_bigDecimalValue` is only used in `getXValue` where `X` is `Int`,
`Double` or `BigDecimal`.
For the context, `BigDecimal` instances are quite large. They contain
several attributes and therefore they consume a significant amount of memory. I
guess normal queries should not have tons of instances of LiteralContext, but
we will receive degenerated queries like `where X in [list with
hundreds/thousand of elements]` or `where X = L1 or X = L2 or ... X = L1000`. I
had to deal with these cases in other databases and suddenly inefficiencies
that looked innocent imply OOMs.
It also seems that `getXValue` are always being called in the initialization
of the transformation functions. Therefore they are not in the hotpath and
catching the BigDecimal doesn't seem to be that useful. This is why I would
recommend to:
- In `getIntValue` and `getDoubleValue`:
- When `_value` is instance of `Number`, cast it and call the
appropriate method.
- Otherwise, calculate the value lazily each time the method is called.
- In `getBigDecimalValue` case I would just calculate the value lazily each
time the method is called.
By doing that we may reduce the amount of memory used in degenerated queries
like the ones I listed above.
PS: I edited the comment because in the original text I was assuming
`getXValue` was called in the hotpath. After reading the rest of the PR I would
say it is not the case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]