mcvsubbu opened a new issue #4663: Add jitter for segment completion threshold
URL: https://github.com/apache/incubator-pinot/issues/4663
 
 
   As of now, we have all partitions of a stream topic completing segments at 
roughly the same time on the consuming servers. This can cause GC issues on the 
server, since a whole lot of old-gen memory may be released in a short while -- 
not to mention additional memory used while generating the segment.
   
   It will be nice if we can add some jitter to the completion thresholds, but 
we need to make sure that the jitter is same across multiple replicas.
   
   For time-jitter, we can compute a segment endtime  as (for example)
      endTime = configuredEndTime - someRandomValue
   
   The random value can be at most (say) 10% of the configured end time, and 
computed based on the partition number.
   
   The time jitter may not help in cases where auto-tuning is used, because we 
aim to hit the row-limit (computed by the auto-tuning algorithm) rather than 
the time limit in the optimal case.
   
   For num rows jitter, it gets a bit more complex. The auto-tuning algorithms 
can introduce a small variant in the number of rows across partitions 
(randomness computed on the basis of the partition number). But then in a 
stream that ingests data at a very high rate, the difference in number of rows 
may not change anything (e.g. if the server takes a few more seconds to consume 
the rows). We need to use the ingestion rate in to this equation as well.
   
   It should be possible to indicate to the controller (via segment completion 
protocol) the ingestion rate computed by the server, so it can be done (in 
theory). 
   
   Adding jitter in the time component is definitely a good start.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to