Re: [PR] Deserialize complex dimensions in group by queries to their respective types when reading from spilled files and cached results (druid)

via GitHub Wed, 10 Jul 2024 05:47:36 -0700


cryptoe commented on code in PR #16620:
URL: https://github.com/apache/druid/pull/16620#discussion_r1672205682



##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java:
##########
@@ -1371,6 +1361,79 @@ public Grouper.BufferComparator 
bufferComparatorWithAggregators(
       );
     }
 
+    @Override
+    public ObjectMapper decorateObjectMapper(ObjectMapper spillMapper)
+    {
+
+      final JsonDeserializer<RowBasedKey> deserializer = new 
JsonDeserializer<RowBasedKey>()
+      {
+        @Override
+        public RowBasedKey deserialize(
+            JsonParser jp,
+            DeserializationContext deserializationContext
+        ) throws IOException
+        {
+          if (!jp.isExpectedStartArrayToken()) {
+            throw DruidException.defensive("Expected array start token, 
received [%s]", jp.getCurrentToken());
+          }
+          jp.nextToken();
+
+          final ObjectCodec codec = jp.getCodec();
+          final int timestampAdjustment = includeTimestamp ? 1 : 0;
+          final int dimsToRead = timestampAdjustment + serdeHelpers.length;
+          int dimsReadSoFar = 0;
+          final Object[] objects = new Object[dimsToRead];
+
+          if (includeTimestamp) {
+            DruidException.conditionalDefensive(
+                jp.currentToken() != JsonToken.END_ARRAY,
+                "Unexpected end of array when deserializing timestamp from the 
spilled files"
+            );
+            objects[dimsReadSoFar] = codec.readValue(jp, Long.class);
+
+            ++dimsReadSoFar;
+            jp.nextToken();
+          }
+
+          while (jp.currentToken() != JsonToken.END_ARRAY) {
+
+            DruidException.conditionalDefensive(

Review Comment:
   Can we remove these checks because they are running per row ? Is it worth 
having this overhead in the hot path ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Deserialize complex dimensions in group by queries to their respective types when reading from spilled files and cached results (druid)

Reply via email to