liran-funaro opened a new issue #10321: URL: https://github.com/apache/druid/issues/10321
### Motivation Alternative approaches to indexing events in-memory might have benefits in some workloads, or be irrelevant in other workloads. But the current Druid design allows only one (hard-coded) approach for that end; i.e., incremental-index, specifically `OnheapIncrementalIndex`. Other implementations such as the `OffheapIncrementalIndex` might also be beneficial in some [workloads](https://github.com/liran-funaro/druid/wiki/Evaluation) (albeit deprecated). Furthermore, future approaches such as suggested in PR #10001 (ISSUE #9967), doesn’t even require the notation of incremental-index and can achieve much better resource utilization than the exiting approaches ([source](https://github.com/liran-funaro/druid/wiki/Evaluation)). But for current and future approaches to be implemented, tested, and embedded into Druid, Druid must allow users to select different indexing approaches (as core or as extensions). The purpose of this issue is to present and discuss a design for a configurable index type. Afterwhich, we plan to implement the decided (and approved) approach and submit it as a PR. ### Proposed changes We propose the following modifications: 1. Add `IngestionIndex` interface 1. Add `IngestionIndexBuilder` and `IngestionIndexFactory` interfaces 1. Add an ingestion knob for tuningConfig 1. Instantiate an `IngestionIndexBuilder` using the given factory #### Add `IngestionIndex` interface `IngestionIndex` will include all the externally visible API of the current `IncrementalIndex`. For example: ```java public interface IngestionIndex { FactsHolder getFacts(); boolean canAppendRow(); String getOutOfRowsReason(); IngestionIndexAddResult add(InputRow row, boolean skipMaxRowsInMemoryCheck) throws IndexSizeExceededException; // etc... } ``` Note that `IncrementalIndexAddResult` was renamed to `IngestionIndexAddResult` in this example. The introduction of the `IngestionIndex` interface incurs many of these small line changes. To minimize diff size, we consider avoiding this new interface (see [rationale](#rationale) below). #### Add `IngestionIndexBuilder` and `IngestionIndexFactory` interfaces `IngestionIndexBuilder` will be an abstract class. It will be identical to the existing `IncrementalIndex.Builder` class, except for the `build()` method that will be abstract and will return an `IngestionIndex` instead of `IncrementalIndex`. Each implementation of `IngestionIndex` will have its own Builder. `IngestionIndexFactory` will be an extension point. It will have a single method: ```java @ExtensionPoint public interface IngestionIndexBuilderFactory { IngestionIndexBuilder builder(); } ``` Each implementation of `IngestionIndexFactory` will return the builder of its ingestion-index, and will register as `JsonSubType` via a dedicated `Module`. #### Add an ingestion knob for tuningConfig First, we add a method to `AppenderatorConfig` interface: `IngestionIndexFactory getIngestionIndexFactory()`. Then, we add to each of its implementations a constructor parameter: ```java @JsonProperty("indexType") @Nullable IngestionIndexFactory indexType ``` which will be returned by the `getIngestionIndexFactory()` method. This will affect all the implementations of `AppenderatorConfig`: `RealtimeTuningConfig`, `RealtimeAppenderatorTuningConfig`, `IndexTask.IndexTuningConfig`, `ParallelIndexTuningConfig`, `HadoopTuningConfig`, `SeekableStreamIndexTaskTuningConfig`, `KafkaIndexTaskTuningConfig`, `KafkaSupervisorTuningConfig`, `KinesisIndexTaskTuningConfig`, `KinesisSupervisorTuningConfig`. The above will introduce a new configuration knob: ```json "tuningConfig": { "indexType": "onheap" } ``` #### Instantiate an `IngestionIndexBuilder` using the given factory All places that instantiate `IncrementalIndex.Builder`, will be modified to instatiate it from ```java config.getSchema().getTuningConfig().getIngestionIndexFactory().builder() ``` That is: `Sink`, `InputSourceSampler`, `IndexGeneratorJob` (hadoop). ### Rationale We considered a few alternative choices for this design. 1. All future indexes will inherit from `IncrementalIndex` 1. Avoiding using factories: “indexType” parameter will be a `String` rather than a factory #### All future indexes will inherit from `IncrementalIndex` This approach will reduce the number of modified lines but will force all future implementations to an incremental-index design. As noted, for example, #10001 does not need incremental-index to function but it inherits from it to reduce the total diff size. We prefer to add an additional interface to avoid this constraint. #### Avoiding using factories: “indexType” parameter will be a `String` rather than a factory An alternative design could be adding another member/method to the existing `IncrementalIndex.Builder`, and adding a “generic” `build()` method: ```java // ... public Builder setIncrementalIndexType(final String incrementalIndexType) { this.incrementalIndexType = incrementalIndexType; return this; } // ... public IncrementalIndex build() { switch (incrementalIndexType) { case "onheap": return buildOnheap(); case "offheap": return buildOffheap(); case "novel-index": return buildNovel(); default: throw new IllegalArgumentException("Unsupported incremental index: " + incrementalIndexType); } } ``` Then, add the ingestion configuration as a string, rather than a factory class. This alternative is simpler and is the one that is currently implemented in #10001. But it does not allow extensions to implement their own ingestion index. This is especially important with the current effort to support Java-11. Currently, #10001 only supports Java-8, but it will support Java-11 it in the future. Since extensions allowed lagging behind this effort (e.g., data sketches), we prefer the design that supports index extensions. ### Operational impact This change will not affect any existing clusters. It will work seamlessly when the `indexType` parameter is omitted. ### Future work This configuration can also affect queries that use the index, such as in `GrouByQueryHelper`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
