Re: [PR] Support maxSubqueryBytes for window functions (druid)

via GitHub Wed, 14 Aug 2024 05:49:51 -0700


imply-cheddar commented on code in PR #16800:
URL: https://github.com/apache/druid/pull/16800#discussion_r1716865900



##########
processing/src/main/java/org/apache/druid/query/operator/WindowOperatorQueryQueryToolChest.java:
##########
@@ -116,6 +128,36 @@ public Sequence<Object[]> resultsAsArrays(
     return (Sequence) resultSequence;
   }
 
+  @Override
+  public Optional<Sequence<FrameSignaturePair>> resultsAsFrames(
+      WindowOperatorQuery query,
+      Sequence<RowsAndColumns> resultSequence,
+      MemoryAllocatorFactory memoryAllocatorFactory,
+      boolean useNestedForUnknownTypes
+  )
+  {

Review Comment:
   You sparked another thought, it could be "serialization": "frame" on the 
context in order to request that things are done as a frame.  That would also 
work.
   
   For the transition, the context flag about num bytes versus num rows is what 
determines which thing does what, so there's a thing that already describes how 
to do the transition.  If we just blindly try one, fail and then do the other, 
that will show up to users as a performance hit because they have no clue that 
there's this rando intermediate logic that is failing for a reason.  They will 
never inform us of the problem and they will just believe that the query is 
slow when one of the causes could be that we are just doing the same work 
twice.  So, we should pick one strategy or the other and work with it.
   
   In terms of allowing Toolchests to "opt-in", if the QueryRunner method 
returned `null` then it would mean that the toolchest doesn't support it, so 
the only option would be to use the array-oriented method, which would be a 
fine fallback.  For the `subquery` context flag, what you need is the 
`resultsAsArrays` method to be able to handle both the Frame case and the 
non-frame case.  
   
   In general, the Query-processing interfaces (Toolchest and RunnerFactory) 
should really be defined in terms of QueryRunners, the problem is that the 
methods are defined in terms of Sequences, which exist *after* the query is 
executed.  This means that the methods have no way of communicating with the 
query processing in order to optimize/push down their logic, which is the 
fundamental problem that you are running into: you are needing to communicate a 
decision to the query processing logic and have no method of doing that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Support maxSubqueryBytes for window functions (druid)

Reply via email to