clintropolis commented on code in PR #16620:
URL: https://github.com/apache/druid/pull/16620#discussion_r1663839764
##########
processing/src/main/java/org/apache/druid/query/datasourcemetadata/DataSourceQueryQueryToolChest.java:
##########
@@ -119,4 +121,10 @@ public CacheStrategy
getCacheStrategy(DataSourceMetadataQuery query)
{
return null;
}
+
+ @Override
+ public CacheStrategy getCacheStrategy(DataSourceMetadataQuery query,
@Nullable ObjectMapper mapper)
+ {
+ return null;
Review Comment:
nit: if this is returning null is the override really needed?
##########
processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChest.java:
##########
@@ -861,4 +856,27 @@ private static BitSet extractionsToRewrite(GroupByQuery
query)
return retVal;
}
+
+ private static Class<?>[] createDimensionClasses(final GroupByQuery query)
+ {
+ final List<DimensionSpec> queryDimensions = query.getDimensions();
+ final Class<?>[] classes = new Class[queryDimensions.size()];
+ for (int i = 0; i < queryDimensions.size(); ++i) {
+ final ColumnType dimensionOutputType =
queryDimensions.get(i).getOutputType();
+ if (dimensionOutputType.is(ValueType.COMPLEX)) {
+ NullableTypeStrategy nullableTypeStrategy =
dimensionOutputType.getNullableStrategy();
+ if (!nullableTypeStrategy.groupable()) {
+ throw DruidException.defensive(
+ "Ungroupable dimension [%s] with type [%s] found in the query.",
+ queryDimensions.get(i).getDimension(),
+ dimensionOutputType
+ );
+ }
+ classes[i] = nullableTypeStrategy.getClazz();
+ } else {
+ classes[i] = Object.class;
Review Comment:
any reason not to use the real class here?
##########
processing/src/main/java/org/apache/druid/query/QueryToolChest.java:
##########
@@ -251,19 +251,30 @@ public Function<ResultType, ResultType>
makePostComputeManipulatorFn(QueryType q
*/
public abstract TypeReference<ResultType> getResultTypeReference();
+ /**
+ *
Review Comment:
nit: please add javadocs or remove (but preferably add and what does null
mean if it returns null like the other one has).
Also, should this be deprecated?
##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java:
##########
@@ -1371,6 +1361,77 @@ public Grouper.BufferComparator
bufferComparatorWithAggregators(
);
}
+ @Override
+ public ObjectMapper decorateObjectMapper(ObjectMapper spillMapper)
+ {
+
+ final JsonDeserializer<RowBasedKey> deserializer = new
JsonDeserializer<RowBasedKey>()
+ {
+ @Override
+ public RowBasedKey deserialize(
+ JsonParser jp,
+ DeserializationContext deserializationContext
+ ) throws IOException
+ {
+ if (!jp.isExpectedStartArrayToken()) {
+ throw DruidException.defensive("Expected array start token,
received [%s]", jp.getCurrentToken());
+ }
+ jp.nextToken();
+
+ final ObjectCodec codec = jp.getCodec();
+ final int timestampAdjustment = includeTimestamp ? 1 : 0;
+ final int dimsToRead = timestampAdjustment + serdeHelpers.length;
+ int dimsReadSoFar = 0;
+ final Object[] objects = new Object[dimsToRead];
+
+ while (jp.currentToken() != JsonToken.END_ARRAY) {
+ if (dimsReadSoFar >= dimsToRead) {
+ throw DruidException.defensive("More dimensions encountered than
expected [%d]", dimsToRead);
+ }
+
+ if (includeTimestamp && dimsReadSoFar == 0) {
+ // Read the timestamp
+ objects[dimsReadSoFar] = codec.readValue(jp, Long.class);
+ } else {
Review Comment:
i guess my thought was that we would be removing one conditional check of
the loop of every column of every row which could add up, but i haven't really
measured one way or another. This is a pretty hot loop though so worth thinking
about making it as efficient as possible
##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java:
##########
@@ -1613,6 +1667,11 @@ public GenericRowBasedKeySerdeHelper(
dictionary.get(lhsBuffer.getInt(lhsPosition +
keyBufferPosition)),
dictionary.get(rhsBuffer.getInt(rhsPosition +
keyBufferPosition))
);
+ if (columnType.is(ValueType.COMPLEX)) {
+ clazz = columnType.getNullableStrategy().getClazz();
+ } else {
+ clazz = Object.class;
Review Comment:
why not always use the column type?
##########
processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChest.java:
##########
@@ -727,13 +715,20 @@ public ResultRow apply(Object input)
int dimPos = 0;
while (dimsIter.hasNext() && results.hasNext()) {
final DimensionSpec dimensionSpec = dimsIter.next();
+ final Object dimensionObject = results.next();
+ final Object dimensionObjectCasted;
// Must convert generic Jackson-deserialized type into the
proper type.
- resultRow.set(
- dimensionStart + dimPos,
- DimensionHandlerUtils.convertObjectToType(results.next(),
dimensionSpec.getOutputType())
- );
-
+ if (dimensionSpec.getOutputType().is(ValueType.COMPLEX)) {
Review Comment:
i kind of wonder if we should special handle json (`COMPLEX<json>`) here
since it is kind of a special complex type (that really should probably be a
built-in standard type instead of complex...) because `convertValue` looks
pretty expensive.
##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java:
##########
@@ -1371,6 +1361,69 @@ public Grouper.BufferComparator
bufferComparatorWithAggregators(
);
}
+ @Override
+ public ObjectMapper decorateObjectMapper(ObjectMapper spillMapper)
+ {
+
+ final JsonDeserializer<RowBasedKey> deserializer = new
JsonDeserializer<RowBasedKey>()
+ {
+ @Override
+ public RowBasedKey deserialize(
+ JsonParser jp,
+ DeserializationContext deserializationContext
+ ) throws IOException
+ {
+ if (!jp.isExpectedStartArrayToken()) {
+ throw DruidException.defensive("Expected array start token,
received [%s]", jp.getCurrentToken());
+ }
+ jp.nextToken();
+
+ final ObjectCodec codec = jp.getCodec();
+ final int timestampAdjustment = includeTimestamp ? 1 : 0;
+ final int dimsToRead = timestampAdjustment + serdeHelpers.length;
+ int dimsReadSoFar = 0;
+ final Object[] objects = new Object[dimsToRead];
+
+ while (jp.currentToken() != JsonToken.END_ARRAY) {
+ if (dimsReadSoFar >= dimsToRead) {
+ throw DruidException.defensive("More dimensions encountered than
expected [%d]", dimsToRead);
+ }
+
+ if (includeTimestamp && dimsReadSoFar == 0) {
+ // Read the timestamp
+ objects[dimsReadSoFar] = codec.readValue(jp, Long.class);
+ } else {
+ DruidException.conditionalDefensive(
+ dimsReadSoFar - timestampAdjustment < serdeHelpers.length,
+ "Insufficient serde helpers present"
+ );
+ // Read the dimension
+ serdeHelpers[dimsReadSoFar - timestampAdjustment].getClazz();
+ objects[dimsReadSoFar] =
+ codec.readValue(jp, serdeHelpers[dimsReadSoFar -
timestampAdjustment].getClazz());
+ }
+
+ ++dimsReadSoFar;
+ jp.nextToken();
+ }
+
+ return new RowBasedKey(objects);
+ }
+ };
+
+ class SpillModule extends SimpleModule
+ {
+ public SpillModule()
+ {
+ addDeserializer(RowBasedKey.class, deserializer);
+ }
+ }
+
+ final ObjectMapper newObjectMapper = spillMapper.copy();
+ newObjectMapper.registerModule(new SpillModule());
Review Comment:
should this be a static somewhere so we don't have to make a new one all the
time?
##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java:
##########
@@ -1726,6 +1791,12 @@ public Object2IntMap<Object[]> getReverseDictionary()
{
return reverseDictionary;
}
+
+ @Override
+ public Class<?> getClazz()
+ {
+ return Object.class;
Review Comment:
shouldn't this be `Object[].class`?
##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java:
##########
@@ -1819,6 +1896,12 @@ public BufferComparator getBufferComparator()
{
return bufferComparator;
}
+
+ @Override
+ public Class<?> getClazz()
+ {
+ return Object.class;
Review Comment:
`String.class`?
##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java:
##########
@@ -1770,6 +1841,12 @@ public Object2IntMap<Object[]> getReverseDictionary()
{
return reverseStringArrayDictionary;
}
+
+ @Override
+ public Class<?> getClazz()
+ {
+ return Object.class;
Review Comment:
same thing `Object[].class`
##########
processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryQueryToolChestTest.java:
##########
@@ -97,6 +98,8 @@ public class GroupByQueryQueryToolChestTest extends
InitializedNullHandlingTest
public static void setUpClass()
{
NullHandling.initializeForTests();
+ //noinspection ResultOfObjectAllocationIgnored
+ new AggregatorsModule();
Review Comment:
hmm, we should move the complex serde registrations of that module to a
static method it calls instead of doing this so its more obvious what is going
on here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]