[GitHub] [druid] a2l007 opened a new issue #10594: Vectorizing extraction dimension specs

GitBox Wed, 18 Nov 2020 11:44:43 -0800


a2l007 opened a new issue #10594:
URL: https://github.com/apache/druid/issues/10594



   The vectorization query engine in Druid presently doesn’t support 
vectorization of extraction dimension specs.
   I’m in the process of coming up with a design proposal for adding this 
support. Reviewing the `VectorGroupByEngineIterator` flow, it looks like there 
isn’t a direct way to support this because after the vectorized aggregation 
within a cursor, the dimensions are lazily resolved for each result row based 
on the dimension ID. 
   For vectorizing the dimension resolution, `DimensionDictionarySelector` 
implementations would start handling a vector of dimension keys which can be 
further used by the extraction functions to perform the resolution.
   The new lookup definition would be something like:
   
   `Memory lookup(Memory keySpace, int keySize, int keyOffset, int startRow, 
int endRow)`
   
   The resolved dimension values would have to be held in a writable memory 
space similar to the existing 
[keyspace](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/VectorGroupByEngine.java#L232).
  Unlike the [grouping key 
size](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/GroupByVectorColumnSelector.java#L36)
 for keys in the keyspace, the size of the resolved dimension values may vary 
and so each size would also have to be held in the dimension valueSpace along 
with the resolved dimension values.
   
   Finally, while writing into the result row, the iterator used within the 
query engine ( for example: 
[VectorGroupByEngineIterator](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/VectorGroupByEngine.java#L442)
 can read the dimension value from the dimension valueSpace with the 
appropriate offset instead of doing a 
[DictionarySelector.lookupName](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/column/StringDictionaryEncodedColumn.java#L368).
 
   
   @gianm , @clintropolis : It would be really helpful to get your thoughts on 
this approach. The concern I have here is regarding holding the dimension 
valueSpace in memory as this can eat into the off heap memory usage 
significantly depending upon the number of query dimensions and its 
cardinality. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] a2l007 opened a new issue #10594: Vectorizing extraction dimension specs

Reply via email to