wjhypo commented on a change in pull request #11307:
URL: https://github.com/apache/druid/pull/11307#discussion_r752778734



##########
File path: docs/configuration/index.md
##########
@@ -1403,6 +1403,7 @@ Additional peon configs include:
 |`druid.indexer.task.restoreTasksOnRestart`|If true, MiddleManagers will 
attempt to stop tasks gracefully on shutdown and restore them on restart.|false|
 |`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks 
using the [Druid input source](../ingestion/native-batch.md#druid-input-source) 
will ignore the provided timestampSpec, and will use the `__time` column of the 
input datasource. This option is provided for compatibility with ingestion 
specs written before Druid 0.22.0.|false|
 |`druid.indexer.server.maxChatRequests`|Maximum number of concurrent requests 
served by a task's chat handler. Set to 0 to disable limiting.|0|
+|`druid.indexer.task.enableInMemoryBitmap`| If true, stream ingestion will 
enable in memory bitmap for applicable dimensions when data is still in memory 
during real time writes before disk persistence triggers. Queries can leverage 
the bitmaps to avoid a full scan to speed up for this stage of data. |false|

Review comment:
       In memory bitmap index is only applied to string type. There is only a 
wrapper without bitmap index to __time to make the query engine not break. I'll 
document this.
   

##########
File path: 
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -303,16 +322,48 @@ protected IncrementalIndex(
       DimensionHandler handler = 
DimensionHandlerUtils.getHandlerFromCapabilities(
           dimName,
           capabilities,
-          dimSchema.getMultiValueHandling()
+          dimSchema.getMultiValueHandling(),
+          enableInMemoryBitmap
       );
-      addNewDimension(dimName, handler);
+      DimensionDesc desc = addNewDimension(dimName, handler);
+
+      if (enableInMemoryBitmap && type.equals(ColumnType.STRING)) {

Review comment:
       Good catch!
   
   https://github.com/apache/druid/blob/master/docs/ingestion/ingestion-spec.md
   in dimensionSpec, createBitmapIndex (default to true)
   ```
   For string typed dimensions, whether or not bitmap indexes should be created 
for the column in generated segments. Creating a bitmap index requires more 
storage, but speeds up certain kinds of filtering (especially equality and 
prefix filtering). Only supported for string typed dimensions.   
   ```
   
   So it means currently bitmap index in batch immutable segments only support 
string type columns, in this PR I try to follow the same design by still only 
supporting string typed dimensions but extending the bitmap index support from 
batch immutable segments to real-time incremental index. In-memory bitmaps will 
be enabled if both `createBitmapIndex` and `enableInMemoryBitmap` are true. 
   
   Not sure if it makes sense to have any such case: `createBitmapIndex` is 
false (bitmap index is disabled in batch immutable segments) and 
`enableInMemoryBitmap` is true (bitmap index is enabled in incremental index), 
to avoid confusion, I avoided this case by disabling in-memory bitmap in this 
case. Let me know if you have other thoughts.
   
   I'll also document this in detail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to