imply-cheddar commented on code in PR #16800:
URL: https://github.com/apache/druid/pull/16800#discussion_r1716865900
##########
processing/src/main/java/org/apache/druid/query/operator/WindowOperatorQueryQueryToolChest.java:
##########
@@ -116,6 +128,36 @@ public Sequence<Object[]> resultsAsArrays(
return (Sequence) resultSequence;
}
+ @Override
+ public Optional<Sequence<FrameSignaturePair>> resultsAsFrames(
+ WindowOperatorQuery query,
+ Sequence<RowsAndColumns> resultSequence,
+ MemoryAllocatorFactory memoryAllocatorFactory,
+ boolean useNestedForUnknownTypes
+ )
+ {
Review Comment:
You sparked another thought, it could be "serialization": "frame" on the
context in order to request that things are done as a frame. That would also
work.
For the transition, the context flag about num bytes versus num rows is what
determines which thing does what, so there's a thing that already describes how
to do the transition. If we just blindly try one, fail and then do the other,
that will show up to users as a performance hit because they have no clue that
there's this rando intermediate logic that is failing for a reason. They will
never inform us of the problem and they will just believe that the query is
slow when one of the causes could be that we are just doing the same work
twice. So, we should pick one strategy or the other and work with it.
In terms of allowing Toolchests to "opt-in", if the QueryRunner method
returned `null` then it would mean that the toolchest doesn't support it, so
the only option would be to use the array-oriented method, which would be a
fine fallback. For the `subquery` context flag, what you need is the
`resultsAsArrays` method to be able to handle both the Frame case and the
non-frame case.
In general, the Query-processing interfaces (Toolchest and RunnerFactory)
should really be defined in terms of QueryRunners, the problem is that the
methods are defined in terms of Sequences, which exist *after* the query is
executed. This means that the methods have no way of communicating with the
query processing in order to optimize/push down their logic, which is the
fundamental problem that you are running into: you are needing to communicate a
decision to the query processing logic and have no method of doing that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]