[GitHub] incubator-trafodion pull request #794: [TRAFODION-2317] Infrastructure for c...

zellerh Thu, 27 Oct 2016 13:04:26 -0700

GitHub user zellerh opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/794


    [TRAFODION-2317] Infrastructure for common subexpressions

    This is a first set of changes to allow us to make use of CTEs
    (Common Table Expressions) declared in WITH clauses and to create
    a temp table for them that is then read multiple times in the query.
    
    This also includes a fix for
    [TRAFODION-2280] Need to remove salt columns from uniqueness constraints
    
    Summary of changes:
    
    - Adding a unique statement number in CmpContext
    - Moving the execHiveSQL method from the OSIM code to CmpContext
    - Adding a list of common subexpressions and their references
      to CmpStatement
    - Adding the ability to the Hive Truncate operator to drop the
      table when the TCB gets deallocated
    - Adding the ability to the HDFS scan to compute scan ranges at
      runtime. Those are usually determined in the compiler. This is
      only supported for simple, non-partitioned, delimited tables.
      We need this because we populate the temp table and read from
      in in the same statement, without the option of compiling
      after we inserted into the temp table.
    - Special handling in the MapValueIds node of common subexpressions.
      See the comment in MapValueId::preCodeGen().
    - Moved the binder code to create a FastExtract node into a new
      method FastExtract::makeFastExtractTree(), to be able to call
      it from another place.
    - MapValueIds no longer looks at the "used by MVQR flag" to determine
      the method for VEGRewrite. Instead it checks whether a list of
      values has been provided to do this.
    - Adding a new method, RelExpr::prepareMeForCSESharing, that is
      kind of an "unnormalizer", undoing some of the normalizer
      transformations.
    - Implementing the steps for common subexpression materializations
      described below.
    - Adding the ability to suppress the Hive timestamp modification
      check when truncating a Hive table
    - Adding an optimizer rule to eliminate CommonSubExprRef nodes.
      These nodes should not normally survive past the SQO phase, but
      if the SQO phase gets interrupted by an exception, that could
      happen, since we then fall back to a copy of the tree before
      SQO. In the future, we can consider cost-based decision on
      what to do with common subexpressions.
    - Adding CommonSubExprRef nodes in the parser whenever we expand
      a CTE reference.
    - Adding cleanup code to the "cleanup obsolete volatile tables"
      command that removes obsolete Hive tables used for common
      subexpressions.
    
    Other changes contained in this change set:
    
    - Optimization for empty scans, like select * from t where 1=0
      This now generates a cardinality constraint with 0 rows, which
      can be used later to eliminate parts of a tree.
      (file OptLogRelExpr.cpp)
    - [TRAFODION-2280] Need to remove salt columns from uniqueness
      constraints generated on salted tables.
      (file OptLogRelExpr.cpp)
    - Got rid of the now meaningless "seamonster" display in EXPLAIN.
      (file GenExplain.cpp and misc. expected files)
    - Suppress display of "zombies" in the cstat command. Otherwise,
      these zombies (marked as <defunct>) prevent Trafodion from
      starting, because they are incorrectly considered "orphan"
      processes. This could require a reboot when no reboot is necessary.
      (file core/sqf/sql/scripts/pstat)
    
    Incomplete list of things left to be done:
    
    - TRAFODION-2316: Hive temp tables are not secure. Use volatile
                      tables instead.
    - TRAFODION-2315: Add heuristics to decide when to use the temp table
                      approach.
    - TRAFODION-2320: Make subquery unnesting work with common subexpressions.
    
    Generated Plans
    ---------------
    
    The resulting query plan for a query Q with n common
    subexpressions CSE1 ... CSEn looks like this:
    
                             Root
                               |
                          MapValueIds
                               |
                          BlockedUnion
                           /        \
                        Union        Q
                        /   \
                      ...    CTn
                      /
                   Union
                   /   \
                 CT1   CT2
    
    Each of the CTi variables looks like the following, an
    INSERT OVERWRITE TABLE tempi ...
    
             BlockedUnion
              /        \
          Truncate  FastExtract TEMPi
            TEMPi        |
                        CSEi
    
    The original query Q has the common subexpressions replaced
    with the following:
    
               MapValueIds
                   |
                 scan TEMPi
    
    Here is a simple query and its explain:
    
        prepare s from
        with cse1 as (select d_date_sk, d_date, d_year, d_dow, d_moy from 
date_dim)
        select x.d_year, y.d_date
        from cse1 x join cse1 y on x.d_date_sk = y.d_date_sk
        where x.d_moy = 3;
        
        >>explain options 'f' s;
        
        LC   RC   OP   OPERATOR              OPT       DESCRIPTION           
CARD
        ---- ---- ---- --------------------  --------  --------------------  
---------
        
        11   .    12   root                                                  
1.46E+005
        5    10   11   blocked_union                                         
1.46E+005
        7    9    10   merge_join                                            
7.30E+004
        8    .    9    sort                                                  
1.00E+002
        .    .    8    hive_scan                       CSE_TEMP_CSE1_MXID11  
1.00E+002
        6    .    7    sort                                                  
5.00E+001
        .    .    6    hive_scan                       CSE_TEMP_CSE1_MXID11  
5.00E+001
        1    4    5    blocked_union                                         
7.30E+004
        2    .    4    hive_insert                     CSE_TEMP_CSE1_MXID11  
7.30E+004
        .    .    2    hive_scan                       DATE_DIM              
7.30E+004
        .    .    1    hive_truncate                                         
1.00E+000
        
        --- SQL operation complete.
        >>
    
    CQDs to control common subexpressions
    -------------------------------------
    
    CSE_FOR_WITH is the master switch.
    
        CQD                    Value      Default  Behavior
        ---------------------  ---------  -------  
---------------------------------------
        CSE_FOR_WITH             OFF        Y      No change
                                 ON                Insert a CommonSubExprRef 
node in the
                                                   tree whenever we reference a 
CTE
                                                   (table defined in a WITH 
clause)
        
        CSE_USE_TEMP             OFF        Y      Disable creation of temp 
tables
                                                   for common subexpressions
                                 SYSTEM            Same as OFF for now
                                 ON                Always create a temp table 
for
                                                   common subexpressions where 
possible
        
        CSE_DEBUG_WARNINGS       OFF        Y      No change
                                 ON                Emit diagnostic warnings 
that show why
                                                   we didn't create temp tables 
for
                                                   common subexpressions
        
        CSE_CLEANUP_HIVE_TABLES  OFF        Y      No change
                                 ON                Cleanup Hive tables used for 
CSEs with
                                                   the "cleanup obsolete 
volatile tables"
                                                   command
    
    CommonSubExprRef relational operators
    -------------------------------------
    
    This is a new RelExpr class that is introduced. It marks the common
    subexpressions in a RelExpr tree. This operator remembers the name of
    a common subexpression (e.g. the name used in the WITH clause).
    Multiple such operators can reference to the same name. Each of
    these references has a copy of the tree.
    
    Right now, these operators are created in the parser when we expand a
    CTE (Common Table Expression), declared in a WITH clause.  If the CTE
    is referenced only once, then the CommonSubExprRef operator is removed
    in the binder - it also doesn't live up to its name in this case.
    
    The remaining CommonSubExprRef operators keep track of various changes
    to their child trees, during the binder and normalizer phases.  In
    particular, it tracks which predicates are pushed down into the child
    tree and which outputs are eliminated.
    
    The CmpStatement object keeps a global list of all the
    CommonSubExprRef operators in a statement, so the individual operators
    have a way to communicate with their siblings:
    
    - A statement can have zero or more named common subexpressions.
    - Each reference to a common subexpression is marked in the RelExpr
      tree with a CommonSubExprRef node.
    - In the binder and normalizer, common subexpressions are expanded,
      meaning that multiple copies of them exist, one copy per
      CommonSubExprRef.
    - Common subexpressions can reference other common subexpressions,
      so they, together with the main query, for a DAG (directed
      acyclic graph) of dependencies.
    - Note that CTEs declared in a WITH clause but not referenced are
      ignored and are not part of the query tree.
    
    In the semantic query optimization phase (SQO), the current code makes
    a heuristic decision what to do with common subexpressions - to
    evaluate them multiple times (expand) or to create a temporary table
    once and read that table multiple times.
    
    If we decide to expand, the action is simple: Remove the
    CommonSubExprRef operator from the tree and replace it with its child.
    
    If we decide to create a temp table, things become much more difficult.
    We need to do several steps:
    
    - Pick one of the child trees of the CommonSubExprRefs as the one to
      materialize.
    - Undo any normalization steps that aren't compatible with the other
      CommonSubExprRefs. That means pulling out predicates that are not
      common among the references and adding back outputs that are
      required by other references. If that process fails, we fall back
      and expand the expressions.
    - Create a temp table tmp.
    - Prepare an INSERT OVERWRITE TABLE tmp SELECT * FROM cse tree
      that materializes the common subexpression in a table.
    - Replace the CommonSubExprRef nodes with scans of the temp table.
    - Hook up the insert query tree with the main query, such that it
      is executed before the main query starts.
    
    Temporary tables
    ----------------
    
    At this time, temporary tables are created as Hive tables, with a
    fabricated, unique name, including the session id, a unique statement
    number within the session, and a unique identifier of the common
    subexpression within the statement. The temporary table is created at
    compile time. The query plan contains an operator to truncate the
    table before populating it. The "temporary" Hive table is dropped when
    the executor TCB is deallocated.
    
    Several issues are remaining with this approach:
    
    - If the process exits before executing and deallocating the statement,
      the Hive table is not cleaned up.
      Solution (TBD): Clean up these tables like we clean up left-over
      volatile tables. Both are identified by the session id.
    - If the executor runs into memory pressure and deallocates the TCB,
      then allocates it again at a later time, the temp table is no longer
      there.
      Solution (TBD): Use AQR to recompile the query and create a new table.
    - Query cache: If we cache a query, multiple queries may end up with
      the same temporary table. This works ok as long as these queries are
      executed serially, but it fails if both queries are executed at the
      same time (e.g. open two cursors and fetch from both, alternating).
      Solution (TBD): Add a CQD that disables caching queries with temp tables
      for common subexpressions.
    
    In the future we also want to support volatile tables. However, those also
    have issues:
    
    - Volatile tables aren't cleaned up until the session ends. If we run
      many statements with common subexpressions, that is undesirable. So,
      we have a similar cleanup problem as with Hive tables.
    - Volatile tables take a relatively long time to create.
    - Insert and scan rates on volatile Trafodion tables are slower than
      those on Hive tables.
    
    To-do items are marked with "Todo: CSE: " in the code.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zellerh/incubator-trafodion cses_1026

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/794.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #794
    
----
commit b90dc334e42fd02f12dc3401fedb5a3c1b636093
Author: Hans Zeller <hzel...@apache.org>
Date:   2016-10-27T19:50:20Z

    [TRAFODION-2317] Infrastructure for common subexpressions
    
    This is a first set of changes to allow us to make use of CTEs
    (Common Table Expressions) declared in WITH clauses and to create
    a temp table for them that is then read multiple times in the query.
    
    This also includes a fix for
    [TRAFODION-2280] Need to remove salt columns from uniqueness constraints
    
    Summary of changes:
    
    - Adding a unique statement number in CmpContext
    - Moving the execHiveSQL method from the OSIM code to CmpContext
    - Adding a list of common subexpressions and their references
      to CmpStatement
    - Adding the ability to the Hive Truncate operator to drop the
      table when the TCB gets deallocated
    - Adding the ability to the HDFS scan to compute scan ranges at
      runtime. Those are usually determined in the compiler. This is
      only supported for simple, non-partitioned, delimited tables.
      We need this because we populate the temp table and read from
      in in the same statement, without the option of compiling
      after we inserted into the temp table.
    - Special handling in the MapValueIds node of common subexpressions.
      See the comment in MapValueId::preCodeGen().
    - Moved the binder code to create a FastExtract node into a new
      method FastExtract::makeFastExtractTree(), to be able to call
      it from another place.
    - MapValueIds no longer looks at the "used by MVQR flag" to determine
      the method for VEGRewrite. Instead it checks whether a list of
      values has been provided to do this.
    - Adding a new method, RelExpr::prepareMeForCSESharing, that is
      kind of an "unnormalizer", undoing some of the normalizer
      transformations.
    - Implementing the steps for common subexpression materializations
      described below.
    - Adding the ability to suppress the Hive timestamp modification
      check when truncating a Hive table
    - Adding an optimizer rule to eliminate CommonSubExprRef nodes.
      These nodes should not normally survive past the SQO phase, but
      if the SQO phase gets interrupted by an exception, that could
      happen, since we then fall back to a copy of the tree before
      SQO. In the future, we can consider cost-based decision on
      what to do with common subexpressions.
    - Adding CommonSubExprRef nodes in the parser whenever we expand
      a CTE reference.
    - Adding cleanup code to the "cleanup obsolete volatile tables"
      command that removes obsolete Hive tables used for common
      subexpressions.
    
    Other changes contained in this change set:
    
    - Optimization for empty scans, like select * from t where 1=0
      This now generates a cardinality constraint with 0 rows, which
      can be used later to eliminate parts of a tree.
      (file OptLogRelExpr.cpp)
    - [TRAFODION-2280] Need to remove salt columns from uniqueness
      constraints generated on salted tables.
      (file OptLogRelExpr.cpp)
    - Got rid of the now meaningless "seamonster" display in EXPLAIN.
      (file GenExplain.cpp and misc. expected files)
    - Suppress display of "zombies" in the cstat command. Otherwise,
      these zombies (marked as <defunct>) prevent Trafodion from
      starting, because they are incorrectly considered "orphan"
      processes. This could require a reboot when no reboot is necessary.
      (file core/sqf/sql/scripts/pstat)
    
    Incomplete list of things left to be done:
    
    - TRAFODION-2316: Hive temp tables are not secure. Use volatile
                      tables instead.
    - TRAFODION-2315: Add heuristics to decide when to use the temp table
                      approach.
    - TRAFODION-2320: Make subquery unnesting work with common subexpressions.
    
    Generated Plans
    ---------------
    
    The resulting query plan for a query Q with n common
    subexpressions CSE1 ... CSEn looks like this:
    
                             Root
                               |
                          MapValueIds
                               |
                          BlockedUnion
                           /        \
                        Union        Q
                        /   \
                      ...    CTn
                      /
                   Union
                   /   \
                 CT1   CT2
    
    Each of the CTi variables looks like the following, an
    INSERT OVERWRITE TABLE tempi ...
    
             BlockedUnion
              /        \
          Truncate  FastExtract TEMPi
            TEMPi        |
                        CSEi
    
    The original query Q has the common subexpressions replaced
    with the following:
    
               MapValueIds
                   |
                 scan TEMPi
    
    Here is a simple query and its explain:
    
    prepare s from
    with cse1 as (select d_date_sk, d_date, d_year, d_dow, d_moy from date_dim)
    select x.d_year, y.d_date
    from cse1 x join cse1 y on x.d_date_sk = y.d_date_sk
    where x.d_moy = 3;
    
    >>explain options 'f' s;
    
    LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
    ---- ---- ---- --------------------  --------  --------------------  
---------
    
    11   .    12   root                                                  
1.46E+005
    5    10   11   blocked_union                                         
1.46E+005
    7    9    10   merge_join                                            
7.30E+004
    8    .    9    sort                                                  
1.00E+002
    .    .    8    hive_scan                       CSE_TEMP_CSE1_MXID11  
1.00E+002
    6    .    7    sort                                                  
5.00E+001
    .    .    6    hive_scan                       CSE_TEMP_CSE1_MXID11  
5.00E+001
    1    4    5    blocked_union                                         
7.30E+004
    2    .    4    hive_insert                     CSE_TEMP_CSE1_MXID11  
7.30E+004
    .    .    2    hive_scan                       DATE_DIM              
7.30E+004
    .    .    1    hive_truncate                                         
1.00E+000
    
    --- SQL operation complete.
    >>
    
    CQDs to control common subexpressions
    -------------------------------------
    
    CSE_FOR_WITH is the master switch.
    
    CQD                    Value      Default  Behavior
    ---------------------  ---------  -------  
---------------------------------------
    CSE_FOR_WITH             OFF        Y      No change
                             ON                Insert a CommonSubExprRef node 
in the
                                               tree whenever we reference a CTE
                                               (table defined in a WITH clause)
    
    CSE_USE_TEMP             OFF        Y      Disable creation of temp tables
                                               for common subexpressions
                             SYSTEM            Same as OFF for now
                             ON                Always create a temp table for
                                               common subexpressions where 
possible
    
    CSE_DEBUG_WARNINGS       OFF        Y      No change
                             ON                Emit diagnostic warnings that 
show why
                                               we didn't create temp tables for
                                               common subexpressions
    
    CSE_CLEANUP_HIVE_TABLES  OFF        Y      No change
                             ON                Cleanup Hive tables used for 
CSEs with
                                               the "cleanup obsolete volatile 
tables"
                                               command
    
    CommonSubExprRef relational operators
    -------------------------------------
    
    This is a new RelExpr class that is introduced. It marks the common
    subexpressions in a RelExpr tree. This operator remembers the name of
    a common subexpression (e.g. the name used in the WITH clause).
    Multiple such operators can reference to the same name. Each of
    these references has a copy of the tree.
    
    Right now, these operators are created in the parser when we expand a
    CTE (Common Table Expression), declared in a WITH clause.  If the CTE
    is referenced only once, then the CommonSubExprRef operator is removed
    in the binder - it also doesn't live up to its name in this case.
    
    The remaining CommonSubExprRef operators keep track of various changes
    to their child trees, during the binder and normalizer phases.  In
    particular, it tracks which predicates are pushed down into the child
    tree and which outputs are eliminated.
    
    The CmpStatement object keeps a global list of all the
    CommonSubExprRef operators in a statement, so the individual operators
    have a way to communicate with their siblings:
    
    - A statement can have zero or more named common subexpressions.
    - Each reference to a common subexpression is marked in the RelExpr
      tree with a CommonSubExprRef node.
    - In the binder and normalizer, common subexpressions are expanded,
      meaning that multiple copies of them exist, one copy per
      CommonSubExprRef.
    - Common subexpressions can reference other common subexpressions,
      so they, together with the main query, for a DAG (directed
      acyclic graph) of dependencies.
    - Note that CTEs declared in a WITH clause but not referenced are
      ignored and are not part of the query tree.
    
    In the semantic query optimization phase (SQO), the current code makes
    a heuristic decision what to do with common subexpressions - to
    evaluate them multiple times (expand) or to create a temporary table
    once and read that table multiple times.
    
    If we decide to expand, the action is simple: Remove the
    CommonSubExprRef operator from the tree and replace it with its child.
    
    If we decide to create a temp table, things become much more difficult.
    We need to do several steps:
    
    - Pick one of the child trees of the CommonSubExprRefs as the one to
      materialize.
    - Undo any normalization steps that aren't compatible with the other
      CommonSubExprRefs. That means pulling out predicates that are not
      common among the references and adding back outputs that are
      required by other references. If that process fails, we fall back
      and expand the expressions.
    - Create a temp table tmp.
    - Prepare an INSERT OVERWRITE TABLE tmp SELECT * FROM cse tree
      that materializes the common subexpression in a table.
    - Replace the CommonSubExprRef nodes with scans of the temp table.
    - Hook up the insert query tree with the main query, such that it
      is executed before the main query starts.
    
    Temporary tables
    ----------------
    
    At this time, temporary tables are created as Hive tables, with a
    fabricated, unique name, including the session id, a unique statement
    number within the session, and a unique identifier of the common
    subexpression within the statement. The temporary table is created at
    compile time. The query plan contains an operator to truncate the
    table before populating it. The "temporary" Hive table is dropped when
    the executor TCB is deallocated.
    
    Several issues are remaining with this approach:
    
    - If the process exits before executing and deallocating the statement,
      the Hive table is not cleaned up.
      Solution (TBD): Clean up these tables like we clean up left-over
      volatile tables. Both are identified by the session id.
    - If the executor runs into memory pressure and deallocates the TCB,
      then allocates it again at a later time, the temp table is no longer
      there.
      Solution (TBD): Use AQR to recompile the query and create a new table.
    - Query cache: If we cache a query, multiple queries may end up with
      the same temporary table. This works ok as long as these queries are
      executed serially, but it fails if both queries are executed at the
      same time (e.g. open two cursors and fetch from both, alternating).
      Solution (TBD): Add a CQD that disables caching queries with temp tables
      for common subexpressions.
    
    In the future we also want to support volatile tables. However, those also
    have issues:
    
    - Volatile tables aren't cleaned up until the session ends. If we run
      many statements with common subexpressions, that is undesirable. So,
      we have a similar cleanup problem as with Hive tables.
    - Volatile tables take a relatively long time to create.
    - Insert and scan rates on volatile Trafodion tables are slower than
      those on Hive tables.
    
    To-do items are marked with "Todo: CSE: " in the code.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request #794: [TRAFODION-2317] Infrastructure for c...

Reply via email to