Copilot commented on code in PR #17489:
URL: https://github.com/apache/pinot/pull/17489#discussion_r2681519392
##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##########
@@ -1009,7 +1009,7 @@ private void handleSegmentBuildFailure() {
private boolean startSegmentCommit() {
SegmentCompletionProtocol.Request.Params params = new
SegmentCompletionProtocol.Request.Params();
params.withSegmentName(_segmentNameStr).withStreamPartitionMsgOffset(_currentOffset.toString())
-
.withNumRows(_numRowsConsumed).withInstanceId(_instanceId).withReason(_stopReason);
+
.withNumRows(_numRowsIndexed).withInstanceId(_instanceId).withReason(_stopReason);
Review Comment:
The change from `_numRowsConsumed` to `_numRowsIndexed` in the segment
commit flow lacks test coverage. This is a critical bug fix affecting memory
allocation and segment sizing. Add integration tests that verify correct
behavior when filtering results in a significant difference between consumed
and indexed rows (e.g., 60x ratio).
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/segment/SizeBasedSegmentFlushThresholdComputer.java:
##########
@@ -171,9 +171,9 @@ synchronized int computeThreshold(StreamConfig
streamConfig, String segmentName)
// .getSizeThresholdToFlushSegment(),
// we might end up using a lot more memory than required for the segment
Using a minor bump strategy, until
// we add feature to adjust time We will only slightly bump the threshold
based on numRowsConsumed
- if (_rowsConsumedForLastSegment < _rowsThresholdForLastSegment &&
_sizeForLastSegment < desiredSegmentSizeBytes) {
+ if (_rowsIndexedForLastSegment < _rowsThresholdForLastSegment &&
_sizeForLastSegment < desiredSegmentSizeBytes) {
Review Comment:
The threshold computation logic using `_rowsIndexedForLastSegment` needs
test coverage demonstrating the fix prevents memory spikes. Add tests comparing
memory allocation with high filter ratios (e.g., 200K indexed from 13M
consumed) to verify the corrected calculation prevents over-allocation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]