liran-funaro opened a new issue #10321:
URL: https://github.com/apache/druid/issues/10321


   ### Motivation
   
   Alternative approaches to indexing events in-memory might have benefits in 
some workloads, or be irrelevant in other workloads.
   But the current Druid design allows only one (hard-coded) approach for that 
end; i.e., incremental-index, specifically `OnheapIncrementalIndex`.
   Other implementations such as the `OffheapIncrementalIndex` might also be 
beneficial in some 
[workloads](https://github.com/liran-funaro/druid/wiki/Evaluation) (albeit 
deprecated).
   Furthermore, future approaches such as suggested in PR #10001 (ISSUE #9967), 
doesn’t even require the notation of incremental-index and can achieve much 
better resource utilization than the exiting approaches 
([source](https://github.com/liran-funaro/druid/wiki/Evaluation)).
   
   But for current and future approaches to be implemented, tested, and 
embedded into Druid, Druid must allow users to select different indexing 
approaches (as core or as extensions).
   The purpose of this issue is to present and discuss a design for a 
configurable index type.
   Afterwhich, we plan to implement the decided (and approved) approach and 
submit it as a PR.
   
   ### Proposed changes
   
   We propose the following modifications:
   1. Add `IngestionIndex` interface
   1. Add `IngestionIndexBuilder` and `IngestionIndexFactory` interfaces
   1. Add an ingestion knob for tuningConfig
   1. Instantiate an `IngestionIndexBuilder` using the given factory
   
   #### Add `IngestionIndex` interface
   `IngestionIndex` will include all the externally visible API of the current 
`IncrementalIndex`. For example:
   ```java
   public interface IngestionIndex
   {
     FactsHolder getFacts();
     boolean canAppendRow();
     String getOutOfRowsReason();
     IngestionIndexAddResult add(InputRow row, boolean 
skipMaxRowsInMemoryCheck) throws IndexSizeExceededException;
     // etc...
   }
   ```
   Note that `IncrementalIndexAddResult` was renamed to 
`IngestionIndexAddResult` in this example.
   The introduction of the `IngestionIndex` interface incurs many of these 
small line changes.
   To minimize diff size, we consider avoiding this new interface (see 
[rationale](#rationale) below).
   
   #### Add `IngestionIndexBuilder` and `IngestionIndexFactory` interfaces
   `IngestionIndexBuilder` will be an abstract class. It will be identical to 
the existing `IncrementalIndex.Builder` class, except for the `build()` method 
that will be abstract and will return an `IngestionIndex` instead of 
`IncrementalIndex`.
   
   Each implementation of `IngestionIndex` will have its own Builder.
   
   `IngestionIndexFactory` will be an extension point. It will have a single 
method:
   ```java
   @ExtensionPoint
   public interface IngestionIndexBuilderFactory
   {
     IngestionIndexBuilder builder();
   }
   ```
   
   Each implementation of `IngestionIndexFactory` will return the builder of 
its ingestion-index, and will register as `JsonSubType` via a dedicated 
`Module`.
   
   #### Add an ingestion knob for tuningConfig
   First, we add a method to `AppenderatorConfig` interface: 
`IngestionIndexFactory getIngestionIndexFactory()`.
   Then, we add to each of its implementations a constructor parameter:
   ```java
   @JsonProperty("indexType") @Nullable IngestionIndexFactory indexType
   ```
   which will be returned by the `getIngestionIndexFactory()` method.
   
   This will affect all the implementations of `AppenderatorConfig`: 
`RealtimeTuningConfig`, `RealtimeAppenderatorTuningConfig`, 
`IndexTask.IndexTuningConfig`, `ParallelIndexTuningConfig`, 
`HadoopTuningConfig`, `SeekableStreamIndexTaskTuningConfig`, 
`KafkaIndexTaskTuningConfig`, `KafkaSupervisorTuningConfig`, 
`KinesisIndexTaskTuningConfig`, `KinesisSupervisorTuningConfig`.
   
   The above will introduce a new configuration knob:
   ```json
   "tuningConfig": {
     "indexType": "onheap"
   }
   ```
   
   #### Instantiate an `IngestionIndexBuilder` using the given factory
   All places that instantiate `IncrementalIndex.Builder`, will be modified to 
instatiate it from
   ```java
   config.getSchema().getTuningConfig().getIngestionIndexFactory().builder()
   ```
   That is: `Sink`, `InputSourceSampler`, `IndexGeneratorJob` (hadoop).
   
   ### Rationale
   
   We considered a few alternative choices for this design.
   1. All future indexes will inherit from `IncrementalIndex`
   1. Avoiding using factories: “indexType” parameter will be a `String` rather 
than a factory
   
   
   #### All future indexes will inherit from `IncrementalIndex`
   This approach will reduce the number of modified lines but will force all 
future implementations to an incremental-index design.
   As noted, for example, #10001 does not need incremental-index to function 
but it inherits from it to reduce the total diff size.
   We prefer to add an additional interface to avoid this constraint.
   
   #### Avoiding using factories: “indexType” parameter will be a `String` 
rather than a factory
   An alternative design could be adding another member/method to the existing 
`IncrementalIndex.Builder`, and adding a “generic” `build()` method:
   ```java
   // ...
   
   public Builder setIncrementalIndexType(final String incrementalIndexType)
   {
     this.incrementalIndexType = incrementalIndexType;
     return this;
   }
   
   // ...
   
   public IncrementalIndex build()
   {
     switch (incrementalIndexType) {
       case "onheap":
         return buildOnheap();
       case "offheap":
         return buildOffheap();
       case "novel-index":
         return buildNovel();
       default:
         throw new IllegalArgumentException("Unsupported incremental index: " + 
incrementalIndexType);
     }
   }
   ```
   
   Then, add the ingestion configuration as a string, rather than a factory 
class.
   
   This alternative is simpler and is the one that is currently implemented in 
#10001.
   But it does not allow extensions to implement their own ingestion index.
   This is especially important with the current effort to support Java-11.
   Currently, #10001 only supports Java-8, but it will support Java-11 it in 
the future.
   Since extensions allowed lagging behind this effort (e.g., data sketches), 
we prefer the design that supports index extensions.
   
   ### Operational impact
   
   This change will not affect any existing clusters. It will work seamlessly 
when the `indexType` parameter is omitted.
   
   ### Future work
   
   This configuration can also affect queries that use the index, such as in 
`GrouByQueryHelper`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to