Akshat-Jain commented on code in PR #17038:
URL: https://github.com/apache/druid/pull/17038#discussion_r1774553043
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryFrameProcessor.java:
##########
@@ -510,4 +322,9 @@ private void ensureMaxRowsInAWindowConstraint(int
numRowsInWindow)
));
}
}
+
+ private boolean needToProcessBatch()
+ {
+ return numRowsInFrameRowsAndCols >= maxRowsMaterialized / 2; // Can this
be improved further?
Review Comment:
We need some threshold to start pushing RACs into the operator pipeline.
We discussed that we should push `N rowed RACs` into the pipeline. But it's
not trivial to create RACs of exact size `N`.
So I felt it would be better to just have a threshold like `push RACs into
the pipeline when they cross N rows`. I chose `N = maxRowsMaterialized / 2` but
we can always discuss on better values for this.
> that doesn't give any guarantee that it will be inside bounds
I think it does 🤔
`convertRowFrameToRowsAndColumns()` method enforces the
`maxRowsMaterialized` constraint:
`ensureMaxRowsInAWindowConstraint(frameRowsAndCols.size() + ldrc.numRows())`,
hence it won't allow us to accumulate more than `maxRowsMaterialized` rows.
Thoughts? Let me know if I'm missing something. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]