clintropolis opened a new pull request #11713:
URL: https://github.com/apache/druid/pull/11713
### Description
_work in progress, still quite a bit to cover_
This PR enriches the native engine type system to use a more powerful set of
classes than the existing enumerations, `ValueType` and `ExprType`, which are
primarily used right now.
To share the same structure and utilities when working with both the native
engine and expressions, `TypeSignature`
```
public interface TypeSignature<Type extends TypeDescriptor>
{
Type getType();
@Nullable
String getComplexTypeName();
@Nullable
TypeSignature<Type> getElementType();
}
```
has been added to model basic type information, and is implemented by the
new `ColumnType` and `ExpressionType` classes, as well as `ColumnCapabilities`.
`TypeDescriptor` is an interface shared by both `ExprType` and `ValueType`,
```
public interface TypeDescriptor
{
boolean isNumeric();
boolean isPrimitive();
boolean isArray();
}
```
to expose some common facts about both sets of enums.
To aid in the construction of these types:
```
public interface TypeFactory<T extends TypeSignature<? extends
TypeDescriptor>>
{
T ofString();
T ofFloat();
T ofDouble();
T ofLong();
T ofArray(T elementType);
T ofComplex(@Nullable String complexTypeName);
}
```
has also been added. I'm not currently doing anything cool with it, but
ideally this should re-use stuff for the same types for efficiency.
Beyond these new structures, the vast majority of changes in this PR are
around replacing all occurrences of `ValueType` with `ColumnType`, and
`ExprType` with `ExpressionType`, and adjusting type checking code accordingly.
## JSON Serialization
To be as compatible as possible, serialization is currently done with a
string based format. `LONG`, `STRING`, `FLOAT`, `DOUBLE` all remain the same as
they were when `ValueType` was king, and serde as the appropriate `ColumnType`
(or `ExpressionType`).
Array types have been significantly reworked to take advantage of the new
structure: instead of dedicated typed arrays, `LONG_ARRAY` etc, `ValueType` and
`ExprType` now have a single `ARRAY` type, and `elementType` contains a
reference to the internal type. In the string serialized form, these now look
like `ARRAY<LONG>`, `ARRAY<STRING>`, etc, but can translate the legacy values
if encountered.
Finally, `COMPLEX` will read into an unknown complex type, but if the type
information is present, will serialize as `COMPLEX<typeName>`.
I've added object JSON constructors too, in order to prepare for a future
where we might want to use objects instead of this string based serde, but the
strings are adequately flexible for now I think, since it will be possible to
model all sorts of types that were not possible in the previous system, such as
`ARRAY<ARRAY<COMPLEX<typeName>>` if we want to get wild with it.
## SQL and INFORMATION_SCHEMA
The only change in this PR that will be apparent to most users is that now
that complex type information is preserved through-out the engine, the
`INFORMATION_SCHEMA` columns table can display the complex type information
instead of `OTHER`:
<img width="1169" alt="Screen Shot 2021-09-04 at 6 48 41 PM"
src="https://user-images.githubusercontent.com/1577461/133430337-62899b34-9e05-4b37-a63f-54dcfa1f6029.png">
This is still a bit of a WIP, and JDBC type is still reported as `OTHER`
because i need to do some additional investigation on if there is anything more
appropriate.
<hr>
##### Key changed/added classes in this PR
* `ValueType`
* `ExprType`
* `TypeDescriptor`
* `TypeFactory`
* `TypeSignature`
* `ColumnType`
* `ColumnTypeFactory`
* `ExpressionType`
* `ExpressionTypeFactory`
* `ColumnCapabilities`
* `RowSignature`
* `Calcites`
* `RowSignatures`
<hr>
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]