justinborromeo opened a new issue #7238: [PROPOSAL] Add segment limit for 
native and streaming index tasks
URL: https://github.com/apache/incubator-druid/issues/7238
 
 
   ### Motivation
   
   To prevent IndexTasks and Kafka/KinesisIndexTasks from consuming excessive 
memory, it would be safer if there was a configurable limit on the total # of 
segments that can be opened by a single task.  An example of when a situation 
like this might occur is if an indexing task has a low segmentGranularity (e.g. 
an hour) and a backfilled/late data stream that has a small number of events 
per hour over a large number of hours.  Without any safeguards, the task would 
try to open an excessive number of segments and OOM due to the overhead of 
opening files.
   
   ### Proposed changes
   
   I plan on adding a `maxTotalSegments` field to `AppenderatorConfig` allowing 
users to set it in the ingestion spec.
   
   | Property         | Description                                             
                                     | Default | Required? |
   
|------------------|----------------------------------------------------------------------------------------------|---------|-----------|
   | maxTotalSegments | The maximum number of mutable segments that an indexing 
task is allowed to have open at one time. | 1000     | No        |
   
   If the number of mutable segments exceeds `maxTotalSegments` during 
indexing, the segments will be pushed to deep storage 
(`BatchAppenderatorDriver#pushAllAndClear()`).  This behaviour is similar to 
what occurs if the number of rows exceeds `maxTotalRows`.  For this to work, 
`AppenderatorDriverAddResult#isPushRequired()` should be modified to also check 
whether the number of segments is above the limit.  This is feasible since the 
number of sinks (which have a 1:1 relation to mutable segments) is accessible 
from `AppenderatorImpl#add()` and can be returned as a new field in 
`AppenderatorDriverAddResult`.
   
   ### Rationale
   
   I don't think there's any other options for capping the number of segments 
opened by an indexing task/supervisor.
   
   ### Operational impact
   
   Addition of a field with a default to ingestion spec shouldn't have an 
operational impact.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to