justinborromeo opened a new issue #7238: [PROPOSAL] Add segment limit for native and streaming index tasks URL: https://github.com/apache/incubator-druid/issues/7238 ### Motivation To prevent IndexTasks and Kafka/KinesisIndexTasks from consuming excessive memory, it would be safer if there was a configurable limit on the total # of segments that can be opened by a single task. An example of when a situation like this might occur is if an indexing task has a low segmentGranularity (e.g. an hour) and a backfilled/late data stream that has a small number of events per hour over a large number of hours. Without any safeguards, the task would try to open an excessive number of segments and OOM due to the overhead of opening files. ### Proposed changes I plan on adding a `maxTotalSegments` field to `AppenderatorConfig` allowing users to set it in the ingestion spec. | Property | Description | Default | Required? | |------------------|----------------------------------------------------------------------------------------------|---------|-----------| | maxTotalSegments | The maximum number of mutable segments that an indexing task is allowed to have open at one time. | 1000 | No | If the number of mutable segments exceeds `maxTotalSegments` during indexing, the segments will be pushed to deep storage (`BatchAppenderatorDriver#pushAllAndClear()`). This behaviour is similar to what occurs if the number of rows exceeds `maxTotalRows`. For this to work, `AppenderatorDriverAddResult#isPushRequired()` should be modified to also check whether the number of segments is above the limit. This is feasible since the number of sinks (which have a 1:1 relation to mutable segments) is accessible from `AppenderatorImpl#add()` and can be returned as a new field in `AppenderatorDriverAddResult`. ### Rationale I don't think there's any other options for capping the number of segments opened by an indexing task/supervisor. ### Operational impact Addition of a field with a default to ingestion spec shouldn't have an operational impact.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
