a2l007 opened a new issue #10594: URL: https://github.com/apache/druid/issues/10594
The vectorization query engine in Druid presently doesn’t support vectorization of extraction dimension specs. I’m in the process of coming up with a design proposal for adding this support. Reviewing the `VectorGroupByEngineIterator` flow, it looks like there isn’t a direct way to support this because after the vectorized aggregation within a cursor, the dimensions are lazily resolved for each result row based on the dimension ID. For vectorizing the dimension resolution, `DimensionDictionarySelector` implementations would start handling a vector of dimension keys which can be further used by the extraction functions to perform the resolution. The new lookup definition would be something like: `Memory lookup(Memory keySpace, int keySize, int keyOffset, int startRow, int endRow)` The resolved dimension values would have to be held in a writable memory space similar to the existing [keyspace](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/VectorGroupByEngine.java#L232). Unlike the [grouping key size](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/GroupByVectorColumnSelector.java#L36) for keys in the keyspace, the size of the resolved dimension values may vary and so each size would also have to be held in the dimension valueSpace along with the resolved dimension values. Finally, while writing into the result row, the iterator used within the query engine ( for example: [VectorGroupByEngineIterator](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/VectorGroupByEngine.java#L442) can read the dimension value from the dimension valueSpace with the appropriate offset instead of doing a [DictionarySelector.lookupName](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/column/StringDictionaryEncodedColumn.java#L368). @gianm , @clintropolis : It would be really helpful to get your thoughts on this approach. The concern I have here is regarding holding the dimension valueSpace in memory as this can eat into the off heap memory usage significantly depending upon the number of query dimensions and its cardinality. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
