wjhypo commented on a change in pull request #11307:
URL: https://github.com/apache/druid/pull/11307#discussion_r752778734
##########
File path: docs/configuration/index.md
##########
@@ -1403,6 +1403,7 @@ Additional peon configs include:
|`druid.indexer.task.restoreTasksOnRestart`|If true, MiddleManagers will
attempt to stop tasks gracefully on shutdown and restore them on restart.|false|
|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks
using the [Druid input source](../ingestion/native-batch.md#druid-input-source)
will ignore the provided timestampSpec, and will use the `__time` column of the
input datasource. This option is provided for compatibility with ingestion
specs written before Druid 0.22.0.|false|
|`druid.indexer.server.maxChatRequests`|Maximum number of concurrent requests
served by a task's chat handler. Set to 0 to disable limiting.|0|
+|`druid.indexer.task.enableInMemoryBitmap`| If true, stream ingestion will
enable in memory bitmap for applicable dimensions when data is still in memory
during real time writes before disk persistence triggers. Queries can leverage
the bitmaps to avoid a full scan to speed up for this stage of data. |false|
Review comment:
In memory bitmap index is only applied to string type. There is only a
wrapper without bitmap index to __time to make the query engine not break. I'll
document this.
##########
File path:
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -303,16 +322,48 @@ protected IncrementalIndex(
DimensionHandler handler =
DimensionHandlerUtils.getHandlerFromCapabilities(
dimName,
capabilities,
- dimSchema.getMultiValueHandling()
+ dimSchema.getMultiValueHandling(),
+ enableInMemoryBitmap
);
- addNewDimension(dimName, handler);
+ DimensionDesc desc = addNewDimension(dimName, handler);
+
+ if (enableInMemoryBitmap && type.equals(ColumnType.STRING)) {
Review comment:
Good catch!
https://github.com/apache/druid/blob/master/docs/ingestion/ingestion-spec.md
in dimensionSpec, createBitmapIndex (default to true)
```
For string typed dimensions, whether or not bitmap indexes should be created
for the column in generated segments. Creating a bitmap index requires more
storage, but speeds up certain kinds of filtering (especially equality and
prefix filtering). Only supported for string typed dimensions.
```
So it means currently bitmap index in batch immutable segments only support
string type columns, in this PR I try to follow the same design by still only
supporting string typed dimensions but extending the bitmap index support from
batch immutable segments to real-time incremental index. In-memory bitmaps will
be enabled if both `createBitmapIndex` and `enableInMemoryBitmap` are true.
Not sure if it makes sense to have any such case: `createBitmapIndex` is
false (bitmap index is disabled in batch immutable segments) and
`enableInMemoryBitmap` is true (bitmap index is enabled in incremental index),
to avoid confusion, I avoided this case by disabling in-memory bitmap in this
case. Let me know if you have other thoughts.
I'll also document this in detail.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]