clintropolis opened a new pull request #11853:
URL: https://github.com/apache/druid/pull/11853


   ### Description
   This PR adds support for Druid "complex" types to the native expression 
processing system, made possible after the type system enhancements done in 
#11713. The implications of this are that now it will be possible for _all_ 
Druid data to be usable within expressions, should expressions be added to 
handle these types.
   
   `ObjectBinding`, the non-vectorized expression input data provider, now 
implements `ColumnInspector` so that it can retain type information when 
available, and a new constant, `ComplexExpr` has been added which accepts the 
`ExpressionType` alongside the value to represent these values provided by the 
binding.
   
   Several generic nullable value binary serde methods for types have been 
moved out of `ExprEval` and into `Types`, to hopefully be more generally 
available for writing nullable values that follow the `| null (byte) | value 
(byte[]) |` pattern, which is now all of the `ExprEval` types. I've adusted the 
binary formats slightly to be more consistent, so there are some minor changes 
to the expression buffer aggregator, but this should have no compatibility 
issues because this format is not written to segments anywhere, and contained 
within processing of a single node.
   
   A base interface has been extracted from `ObjectStrategy` in 
`druid-processing`, which is called `ObjectByteStrategy` because naming is hard 
and lives in `druid-core`, to provide conversion between object and binary 
format for complex types. A registry of these `ObjectByteStrategy` to type name 
has been added to hold these, and registering a `ComplexMetricsSerde` in 
`ComplexMetrics` will automatically register its `ObjectStrategy` in the lower 
level `ObjectByteStrategy` registry. This would be less messy if `druid-core` 
and `druid-processing` were just merged since the `ComplexMetrics` registry 
could just be used directly for binary serialization of expressions, but.. they 
are not yet.
   
   To showcase the new complex expressions, I have added 3 new bloom filter 
expressions to the `druid-bloom-filter` extension:
   * `bloom_filter(expr)` - creates a bloom filter with expected capacity `expr`
   * `bloom_filter_test(expr1, expr2)` - checks if `expr2` is contained in the 
bloom filter `expr1`
   * `bloom_filter_add(expr1, expr2)` - adds `expr2` to bloom filter `expr1`.
   * 
   I have not documented these yet, because I'm still considering how to 
position them, and there are several parts of the expression system which are 
still missing documentation for the same reason like the native expression 
aggregator. I have also not wired these up to SQL functions yet for similar 
reasons.
   
   With these expressions, it is possible for example to even re-create the 
native bloom filter aggregator - instead using the expression aggregator:
   
   ```
       {
         "type": "expression",
         "name": "bloom_expression",
         "fields": ["user"],
         "initialValue": "bloom_filter(10000)",
         "fold": "bloom_filter_add(user, __acc)",
         "maxSizeBytes": 8096
       }
   ```
   
   but I think this is just scratching the surface of what this change will 
make possible.
   
   <img width="1413" alt="Screen Shot 2021-09-25 at 4 52 31 PM" 
src="https://user-images.githubusercontent.com/1577461/139329494-ba3f3f28-06f2-498f-b42a-1b32054ec622.png";>
   
   ### Future work
   Implementing additional expressions for other complex type extensions, such 
as data sketches, etc.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `ExprEval`
    * `Types`
    * `ObjectStrategy`
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to