Re: [PR] Support maxSubqueryBytes for window functions (druid)

via GitHub Thu, 15 Aug 2024 22:23:56 -0700


LakshSingla commented on code in PR #16800:
URL: https://github.com/apache/druid/pull/16800#discussion_r1719328472



##########
processing/src/main/java/org/apache/druid/query/operator/WindowOperatorQueryQueryToolChest.java:
##########
@@ -116,6 +128,36 @@ public Sequence<Object[]> resultsAsArrays(
     return (Sequence) resultSequence;
   }
 
+  @Override
+  public Optional<Sequence<FrameSignaturePair>> resultsAsFrames(
+      WindowOperatorQuery query,
+      Sequence<RowsAndColumns> resultSequence,
+      MemoryAllocatorFactory memoryAllocatorFactory,
+      boolean useNestedForUnknownTypes
+  )
+  {

Review Comment:
   > the context flag about num bytes versus num rows is what determines which 
thing does what, so there's a thing that already describes how to do the 
transition
   
   I thought about this, however, we can also have a cluster-level config that 
determines the limit, so we should be looking at that as well in the window 
tool chest, which seems uncool that the window tool chest has to determine what 
to do.
   
   >  If we just blindly try one, fail and then do the other, that will show up 
to users as a performance hit because they have no clue that there's this rando 
intermediate logic that is failing for a reason
   
   Fallback is mostly for when the types aren't known. I agree that it is a 
performance hit, but at the time this feature was added, the signature informed 
by the tool chest didn't need to have a type. Scan queries only had knowledge 
of the column names (and not types), group by/time series... etc. toolchests 
could return `null` for the aggregator's dimensions. The fallback was present 
for these cases, where it's easy to detect the failure relatively early in the 
whole subquery processing flow. Fallback meant that transitioning from row -> 
byte based limit was simple. There's an undocumented parameter that treated 
these null types as JSON types, but that had logical flaws of its own iirc. 
   
   Removing the fallback would make the change much easier and I have a lot 
more confidence that the query doesn't need to fallback (and we have the known 
cases before hand), however, I'd still like to keep it just in case for a 
while. I have an idea, and it depends on the fact that RACs can convert itself 
to frames properly, and window toolchests would never fall back. 
   
   Thanks for the help!! I can work with the "serialization": "frame" parameter 
as a workaround to the current design choices.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Support maxSubqueryBytes for window functions (druid)

Reply via email to