jihoonson commented on a change in pull request #11713:
URL: https://github.com/apache/druid/pull/11713#discussion_r721591664



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeSignature.java
##########
@@ -25,6 +25,49 @@
 import javax.annotation.Nullable;
 import java.util.Objects;
 
+/**
+ * This interface serves as a common foundation for Druids native type system, 
and provides common methods for reasoning
+ * about and handling type matters. Additional type common type handling 
methods are provided by {@link Types} utility.
+ *
+ * This information is used by Druid to make decisions about how to correctly 
process inputs and determine output types
+ * at all layers of the engine, from how to group, filter, aggregate, and 
transform columns up to how to best plan SQL
+ * into native Druid queries.
+ *
+ * The native Druid type system can currently be broken down at a high level 
into 'primitive' types, 'array' types, and
+ * 'complex' types, and this classification is defined by an enumeration which 
implements {@link TypeDescriptor} such
+ * as {@link ValueType} for the general query engines and {@link 
org.apache.druid.math.expr.ExprType} for low level
+ * expression processing. This is exposed via {@link #getType()}, and will be 
most callers first point of contact with
+ * the {@link TypeSignature} when trying to decide how to handle a given input.
+ *
+ * Druid 'primitive' types includes strings and numeric types. Note: 
multi-value string columns are still considered
+ * 'primitive' string types, because they do not behave as traditional arrays 
(unless explicitly converted to an array),
+ * and are always serialized as opportunistically single valued, so whether or 
not any particular string column is
+ * multi-valued might vary from segment to segment. The concept of 
multi-valued strings only exists at a very low
+ * engine level and are only modeled by the ColumnCapabilities implementation 
of {@link TypeSignature}.
+ *
+ * 'array' types contain additional nested type information about the elements 
of an array, a reference to another
+ * {@link TypeSignature} through the {@link #getElementType()} method. If 
{@link TypeDescriptor#isArray()} is true,
+ * then {@link #getElementType()} should never return null.
+ *
+ * 'complex' types are Druids extensible types, which have a registry that 
allows these types to be defined and
+ * associated with a name which is available as {@link #getComplexTypeName()}. 
These type names are unique, so this
+ * information is used to allow handling of these 'complex' types to confirm.
+ *
+ * {@link TypeSignature} is currently manifested in 3 forms: {@link 
ColumnType} which is the high level 'native' Druid
+ * type definitions using {@link ValueType}, and is used by row signatures and 
SQL schemas, used by callers as input
+ * to various API methods, and most general purpose type handling. In 
'druid-processing' there is an additional
+ * type ... type, ColumnCapabilities, which is effectively a {@link 
ColumnType} but includes some additional
+ * information for low level query processing, such as details about whether a 
column has indexes, dictionaries, null
+ * values, is a multi-value string column, and more.
+ *
+ * The third is {@link org.apache.druid.math.expr.ExpressionType}, which 
instead of {@link ValueType} uses
+ * {@link org.apache.druid.math.expr.ExprType}, and is used exclusively for 
handling Druid native expression evaluation.
+ * {@link org.apache.druid.math.expr.ExpressionType} exists because the Druid 
expression system does not natively
+ * handle float types, so it is essentially a mapping of {@link ColumnType} 
where floats are coerced to double typed
+ * values. Ideally at some point Druid expressions can just handle floats 
directly, and these two {@link TypeSignature}
+ * can be merged, which will simplify this interface to no longer need be 
generic, allow {@link ColumnType} to be
+ * collapsed into {@link BaseTypeSignature}, and finally unify the type system.
+ */

Review comment:
       :+1: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to