jihoonson commented on a change in pull request #11713:
URL: https://github.com/apache/druid/pull/11713#discussion_r721591664
##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeSignature.java
##########
@@ -25,6 +25,49 @@
import javax.annotation.Nullable;
import java.util.Objects;
+/**
+ * This interface serves as a common foundation for Druids native type system,
and provides common methods for reasoning
+ * about and handling type matters. Additional type common type handling
methods are provided by {@link Types} utility.
+ *
+ * This information is used by Druid to make decisions about how to correctly
process inputs and determine output types
+ * at all layers of the engine, from how to group, filter, aggregate, and
transform columns up to how to best plan SQL
+ * into native Druid queries.
+ *
+ * The native Druid type system can currently be broken down at a high level
into 'primitive' types, 'array' types, and
+ * 'complex' types, and this classification is defined by an enumeration which
implements {@link TypeDescriptor} such
+ * as {@link ValueType} for the general query engines and {@link
org.apache.druid.math.expr.ExprType} for low level
+ * expression processing. This is exposed via {@link #getType()}, and will be
most callers first point of contact with
+ * the {@link TypeSignature} when trying to decide how to handle a given input.
+ *
+ * Druid 'primitive' types includes strings and numeric types. Note:
multi-value string columns are still considered
+ * 'primitive' string types, because they do not behave as traditional arrays
(unless explicitly converted to an array),
+ * and are always serialized as opportunistically single valued, so whether or
not any particular string column is
+ * multi-valued might vary from segment to segment. The concept of
multi-valued strings only exists at a very low
+ * engine level and are only modeled by the ColumnCapabilities implementation
of {@link TypeSignature}.
+ *
+ * 'array' types contain additional nested type information about the elements
of an array, a reference to another
+ * {@link TypeSignature} through the {@link #getElementType()} method. If
{@link TypeDescriptor#isArray()} is true,
+ * then {@link #getElementType()} should never return null.
+ *
+ * 'complex' types are Druids extensible types, which have a registry that
allows these types to be defined and
+ * associated with a name which is available as {@link #getComplexTypeName()}.
These type names are unique, so this
+ * information is used to allow handling of these 'complex' types to confirm.
+ *
+ * {@link TypeSignature} is currently manifested in 3 forms: {@link
ColumnType} which is the high level 'native' Druid
+ * type definitions using {@link ValueType}, and is used by row signatures and
SQL schemas, used by callers as input
+ * to various API methods, and most general purpose type handling. In
'druid-processing' there is an additional
+ * type ... type, ColumnCapabilities, which is effectively a {@link
ColumnType} but includes some additional
+ * information for low level query processing, such as details about whether a
column has indexes, dictionaries, null
+ * values, is a multi-value string column, and more.
+ *
+ * The third is {@link org.apache.druid.math.expr.ExpressionType}, which
instead of {@link ValueType} uses
+ * {@link org.apache.druid.math.expr.ExprType}, and is used exclusively for
handling Druid native expression evaluation.
+ * {@link org.apache.druid.math.expr.ExpressionType} exists because the Druid
expression system does not natively
+ * handle float types, so it is essentially a mapping of {@link ColumnType}
where floats are coerced to double typed
+ * values. Ideally at some point Druid expressions can just handle floats
directly, and these two {@link TypeSignature}
+ * can be merged, which will simplify this interface to no longer need be
generic, allow {@link ColumnType} to be
+ * collapsed into {@link BaseTypeSignature}, and finally unify the type system.
+ */
Review comment:
:+1:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]