GitHub user jianqiao opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/315

    [DO NOT MERGE] Refactor type system to provide better extensibility of 
types and functions

    This is a preliminary PR that is not ready to be merged but provides an 
overall view of the type system refactoring work. Many constructs are at their 
initial designs and maybe further improved.
    
    The PR aims at reviewing the refactoring designs at the "architecture" 
level. Detailed code style and unit test issues may be addressed later in 
subsequent concrete PRs.
    
    
    The overall purpose of the refactoring is to improve the extensibility of 
the existing type/function system (i.e. support more kinds of types/functions 
and make it easier to add new types and functions), while retaining the 
performance of the current system.
    
    ### Major Changes
    #### Part I. Type System
    ---
    ##### 1. Categorize all types into four [_memory 
layouts_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeID.hpp#L64).
    
    The four memory layouts are:
    * __CxxInlinePod__ <sub>(C++ plain old data)</sub>
    * __ParInlinePod__ <sub>(Parameterized inline plain old data)</sub>
    * __ParOutOfLinePod__ <sub>(Parameterized out-of-line plain old data)</sub>
    * __CxxGeneric__ <sub>(C++ generic types)</sub>
    
    Memory layout decides how the corresponding type's values are stored and 
represented.
    
    Briefly speaking,
    * _CxxInlinePod_ corresponds to C++ primitive types or POD structs.
      * E.g. _int_, _double_, _struct { double x, double y }_.
      * The size of a CxxInlinePod value is known at C++ compile time (e.g 
_double_ has size 8, _struct { double x, double y }_ has size 16).
    * _ParInlinePod_ corresponds to database defined "fixed length" types.
      * E.g. _Char(8)_, _Char(20)_.
      * The size of such types' values are not known at C++ compile time. 
Instead, the type is parameterized by an unsigned integer, where the 
parameter's value is known at SQL query compile time (which is C++ run-time).
    * _ParOutOfLinePod_ corresponds to database defined "variable length" types.
      * E.g. _Varchar(20)_.
      * The size of such types' values are not known until SQL query run-time.
    * _CxxGeneric_ correponds to C++ general types (i.e. any C++ type).
      * E.g. _std::set&lt;int&gt;_, _std::vector&lt;const Type*&gt;_.
      * Such types have to implement serialization/deserialization methods to 
have storage support.
    ---
    ##### 2. Use 
[_TypeIDTrait_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeRegistrar.hpp#L59)
 to allow many information to be known at compile time.
    
    With this per-type trait information, we can avoid many boilerplate code 
for each subclass of _Type_ by using template techniques and specialize on the 
memory layout. See 
[_TypeSynthesizer_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeSynthesizer.hpp)
 and 
[_TypeFactory_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeFactory.cpp#L69).
    
    _TypeIDTrait_ is also extensively used in many other places as it provides 
all the required compile-time information about a type.
    
    ---
    
    ##### 3. Support more types.
    Details will be written later about how to add a new type into the 
Quickstep system.
    
    The current PR has some example types added:
    * The 
[_Bool_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/BoolType.hpp)
 type. It will be used later for connecting scalar functions and predicates.
    * The 
[_Text_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TextType.hpp)
 type. A general non-parameterized string type.
      * __TODO:__ We need some updates in the storage block module (potentially 
also other places) to handle the "infinite maximum byte size" types.
    * The 
[_MetaType_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/MetaType-decl.hpp)
 type. It is "type of type". I.e. a value of _MetaType_ has C++ type _const 
Type*_.
    * The 
[_Array_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/ArrayType.hpp)
 type. A generic type that represents an array. This type takes a MetaType 
value as parameter, where the parameter specifies the array's element type.
      * __TODO__: We need specialized array types such as _IntArray_ and 
_TextArray_ for performance consideration.
    
    ---
    ##### 4. Improve the type casting mechanism.
    
    Type casting (coersion) is an important feature that is needed in practice 
from time to time.
    
    This PR's design defined an overall 
[template](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/CastFunctorOverloads.hpp#L41)
    ```
    template <typename SourceType, typename TargetType, typename Enable = void>
    struct CastFunctor;
    ```
    which is then specialized by different source/target types.
    
    The coercibility between two types is then 
[inferred](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/utility/CastUtil.cpp#L58)
 according to whether the corresponding specialization exists. Thus it suffices 
to just specialize _CastFunctor_ when adding a new casting operation, and all 
the dependent places (e.g. _Type::isCoercibleFrom()_) will mostly be 
auto-generated by the system (unless the target type is a parameterized type 
and you want to do some further checks).
    
    Note that _safe-coercibility_ is a separate issue and needs to be taken 
care of mostly manually, by overriding _Type::isSafelyCoercibleFrom()_.
    
    Explicit casting is supported with a PostgreSQL-like syntax. E.g.
    
    (1)
    ```
    SELECT (i::text + (i+1)::text)::int AS result FROM generate_series(1, 3) AS 
g(i);
    
    --
    +-----------+
    |result     |
    +-----------+
    |         12|
    |         23|
    |         34|
    +-----------+
    ```
    (2)
    ```
    CREATE TABLE r(x varchar(16));
    
    INSERT INTO r SELECT pow(10, i)::varchar(10) FROM generate_series(1, 3) AS 
g(i);
    
    SELECT 'There are ' + length(x)::varchar(10) + ' characters in ' + x AS 
result FROM r;
    
    --
    +---------------------------------------------------+
    |result                                             |
    +---------------------------------------------------+
    |                       There are 2 characters in 10|
    |                      There are 3 characters in 100|
    |                     There are 4 characters in 1000|
    +---------------------------------------------------+
    ```
    
    (3)
    ```
    SELECT {1,2,3}::array(double) AS result from generate_series(1, 1);
    
    --
    +--------------------------------+
    |result                          |
    +--------------------------------+
    |                         {1,2,3}|
    +--------------------------------+
    ```
    
    __NOTE__: The work is not yet fully completed so there may be `LOG(FATAL)` 
aborts for some combinations of queries.
    
    
    Implicit coersion is supported when resolving scalar functions, see 
[here](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationFactory.cpp#L292).
 For example, we have support for the _sqrt_ function where the parameter can 
be a _Float_ or _Double_ value. Consider the query
    ```
    SELECT sqrt(x) FROM r;
    ```
    where `x` has _Int_ type, then an implicit coercion from _Int_ to _Float_ 
will be added.
    
    ---
    ##### 5. Add 
[_GenericValue_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/GenericValue.hpp)
 to represent typed-values of all four memory layouts.
    
    The original _TypedValue_ is not sufficient to represent _CxxGeneric_ 
values, as we need to embed the overall _Type_ information in order to handle 
value allocation/copy/destruction. However, due to performance consideration, 
we may not just replace _TypedValue_ with a more generic but slower 
implementation. Thus, a separate _GenericValue_ is added and we still use 
_TypedValue_ when handling storage-related operations.
    
    ---
    ##### 6. Move type resolving from parser to resolver.
    
    This avoids the need of modifying _SqlParser.ypp_ for adding a new type.
    
    See 
[_ParseDataType_](https://github.com/apache/incubator-quickstep/blob/refactor-type/parser/ParseDataType.hpp)
 and 
[_Resolver::resolveDataType()_](https://github.com/apache/incubator-quickstep/blob/refactor-type/query_optimizer/resolver/Resolver.cpp#L1196).
    
    ~
    
    #### Part II. Scalar Function
    ---
    ##### 1. Implement 
[_UnaryOperationSynthesizer_/_UncheckedUnaryOperatorSynthesizer_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/UnaryOperationSynthesizer.hpp#L58)
 to make it easier to add unary functions.
    
    Example unary functions:
    * 
[Arithmetic](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/ArithmeticUnaryFunctors.hpp#L60)
    * 
[String](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/AsciiStringUnaryFunctors.hpp#L106)
    * 
[Math](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/CMathUnaryFunctors.hpp#L70)
    
    ##### 2. Implement 
[_BinaryOperationSynthesizer_/_UncheckedBinaryOperatorSynthesizer_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/BinaryOperationSynthesizer.hpp#L62)
 to make it easier to add binary functions.
    
    Example binary functions:
    * 
[Arithmetic](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/ArithmeticBinaryFunctors.hpp#L94)
    * 
[String](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/AsciiStringBinaryFunctors.hpp#L127)
    * 
[Math](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/binary_operations/CMathBinaryFunctors.hpp#L66)
    
    ##### 3. Use 
[_OperationSignature_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationSignature.hpp#L45)
 and 
[_OperationFactory_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationFactory.hpp#L48)
 to support general operation resolution.
    
    * See 
[_OperationFactory::OperationFactory()_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/OperationFactory.cpp#L85)
 about how operations are registered.
    * See 
[_Resolver::resolveScalarFunction()_](https://github.com/apache/incubator-quickstep/blob/refactor-type/query_optimizer/resolver/Resolver.cpp#L2889)
 about how a function from SQL query gets resolved.
    
    
    ~
    
    #### Part III. TODOs
    * A lot of _TODO(refactor-type)_ in the code to be fixed.
    * Refactor the predicate system (we will have something like 
_ComparisonSynthesizer_).
    * A lot unit tests are broken (due to API change) and need to be fixed.
    * Comments and style of template metaprogramming code.
    * More to be added ...


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-quickstep refactor-type

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/315.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #315
    
----
commit cb564509c8da64af1c0981ca816f962f94b06602
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-03-04T18:11:13Z

    Refactor type system and operations.

commit 02005508dd4b6813ecc494e2cdfed842b4c93dc4
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-09-28T02:12:59Z

    Some updates

commit ebf44cd2dd230bd45c849cb008005ad9c07b2d60
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-10-02T05:26:05Z

    Updates for adding generic types

commit a7031a343814bb003353c6b0b75957f66db6240c
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-10-02T06:30:46Z

    Add array expression

commit b6fd31fec0cd9b1a89eee1fe89af70b85df44bc5
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-10-02T20:36:22Z

    Continue the work

commit 1e69fb18eb9e7f31c48d85aaef781dca1ba8290a
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-10-03T04:46:48Z

    Updates for array type

commit bef66ad47a6edb69f76f29c56d528a28ba7760b8
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-10-03T06:52:29Z

    Updates to meta type

commit 3a3772d91bd94d269b2f2fa49895f53385d46381
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-10-03T22:03:20Z

    Add text type

commit 0957264b534cb65bbcc28999833f30bd8888856a
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-10-04T05:26:36Z

    Type as first class citizen

commit 9cb664c802f2a85862dea1cc41a08c989dd579e7
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-10-04T08:21:44Z

    More updates to types

commit 1cb97e3547846240f3ceb0a7f086dbe912174b72
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-10-05T22:02:33Z

    More updates, refactor names

commit 477c385d427483d4c2708449927f14268e53c311
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-10-10T18:20:17Z

    Updates to casts

commit a3aec8e789b66c2c1de64cbd2cdc3fac70b8121b
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-10-11T08:38:40Z

    Updates to implicit casts

----


---

Reply via email to