clintropolis opened a new pull request #11713:
URL: https://github.com/apache/druid/pull/11713


   ### Description
   _work in progress, still quite a bit to cover_
   
   This PR enriches the native engine type system to use a more powerful set of 
classes than the existing enumerations, `ValueType` and `ExprType`, which are 
primarily used right now.
   
   To share the same structure and utilities when working with both the native 
engine and expressions, `TypeSignature` 
   
   ```
   public interface TypeSignature<Type extends TypeDescriptor>
   {
     Type getType();
     @Nullable
     String getComplexTypeName();
     @Nullable
     TypeSignature<Type> getElementType();
   }
   ```
   has been added to model basic type information, and is implemented by the 
new `ColumnType` and `ExpressionType` classes, as well as `ColumnCapabilities`. 
`TypeDescriptor` is an interface shared by both `ExprType` and `ValueType`,
   
   ```
   public interface TypeDescriptor
   {
     boolean isNumeric();
     boolean isPrimitive();
     boolean isArray();
   }
   ```
   to expose some common facts about both sets of enums.
   
   To aid in the construction of these types:
   ```
   public interface TypeFactory<T extends TypeSignature<? extends 
TypeDescriptor>>
   {
     T ofString();
     T ofFloat();
     T ofDouble();
     T ofLong();
     T ofArray(T elementType);
     T ofComplex(@Nullable String complexTypeName);
   }
   ```
   has also been added. I'm not currently doing anything cool with it, but 
ideally this should re-use stuff for the same types for efficiency.
   
   Beyond these new structures, the vast majority of changes in this PR are 
around replacing all occurrences of `ValueType` with `ColumnType`, and 
`ExprType` with `ExpressionType`, and adjusting type checking code accordingly.
   
   ## JSON Serialization
   To be as compatible as possible, serialization is currently done with a 
string based format. `LONG`, `STRING`, `FLOAT`, `DOUBLE` all remain the same as 
they were when `ValueType` was king, and serde as the appropriate `ColumnType` 
(or `ExpressionType`). 
   
   Array types have been significantly reworked to take advantage of the new 
structure: instead of dedicated typed arrays, `LONG_ARRAY` etc, `ValueType` and 
`ExprType` now have a single `ARRAY` type, and `elementType` contains a 
reference to the internal type. In the string serialized form, these now look 
like `ARRAY<LONG>`, `ARRAY<STRING>`, etc, but can translate the legacy values 
if encountered.
   
   Finally, `COMPLEX` will read into an unknown complex type, but if the type 
information is present, will serialize as `COMPLEX<typeName>`.
   
   I've added object JSON constructors too, in order to prepare for a future 
where we might want to use objects instead of this string based serde, but the 
strings are adequately flexible for now I think, since it will be possible to 
model all sorts of types that were not possible in the previous system, such as 
`ARRAY<ARRAY<COMPLEX<typeName>>` if we want to get wild with it.
   
   ## SQL and INFORMATION_SCHEMA
   The only change in this PR that will be apparent to most users is that now 
that complex type information is preserved through-out the engine, the 
`INFORMATION_SCHEMA` columns table can display the complex type information 
instead of `OTHER`:
   
   <img width="1169" alt="Screen Shot 2021-09-04 at 6 48 41 PM" 
src="https://user-images.githubusercontent.com/1577461/133430337-62899b34-9e05-4b37-a63f-54dcfa1f6029.png";>
   
   This is still a bit of a WIP, and JDBC type is still reported as `OTHER` 
because i need to do some additional investigation on if there is anything more 
appropriate.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `ValueType`
    * `ExprType`
    * `TypeDescriptor`
    * `TypeFactory`
    * `TypeSignature`
    * `ColumnType`
    * `ColumnTypeFactory`
    * `ExpressionType`
    * `ExpressionTypeFactory`
    * `ColumnCapabilities`
    * `RowSignature`
    * `Calcites`
    * `RowSignatures`
   
   <hr>
   
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to