[TRAFODION-2317] Infrastructure for common subexpressions
This is a first set of changes to allow us to make use of CTEs
(Common Table Expressions) declared in WITH clauses and to create
a temp table for them that is then read multiple times in the query.
This also includes a fix for
[TRAFODION-2280] Need to remove salt columns from uniqueness constraints
Summary of changes:
- Adding a unique statement number in CmpContext
- Moving the execHiveSQL method from the OSIM code to CmpContext
- Adding a list of common subexpressions and their references
to CmpStatement
- Adding the ability to the Hive Truncate operator to drop the
table when the TCB gets deallocated
- Adding the ability to the HDFS scan to compute scan ranges at
runtime. Those are usually determined in the compiler. This is
only supported for simple, non-partitioned, delimited tables.
We need this because we populate the temp table and read from
in in the same statement, without the option of compiling
after we inserted into the temp table.
- Special handling in the MapValueIds node of common subexpressions.
See the comment in MapValueId::preCodeGen().
- Moved the binder code to create a FastExtract node into a new
method FastExtract::makeFastExtractTree(), to be able to call
it from another place.
- MapValueIds no longer looks at the "used by MVQR flag" to determine
the method for VEGRewrite. Instead it checks whether a list of
values has been provided to do this.
- Adding a new method, RelExpr::prepareMeForCSESharing, that is
kind of an "unnormalizer", undoing some of the normalizer
transformations.
- Implementing the steps for common subexpression materializations
described below.
- Adding the ability to suppress the Hive timestamp modification
check when truncating a Hive table
- Adding an optimizer rule to eliminate CommonSubExprRef nodes.
These nodes should not normally survive past the SQO phase, but
if the SQO phase gets interrupted by an exception, that could
happen, since we then fall back to a copy of the tree before
SQO. In the future, we can consider cost-based decision on
what to do with common subexpressions.
- Adding CommonSubExprRef nodes in the parser whenever we expand
a CTE reference.
- Adding cleanup code to the "cleanup obsolete volatile tables"
command that removes obsolete Hive tables used for common
subexpressions.
Other changes contained in this change set:
- Optimization for empty scans, like select * from t where 1=0
This now generates a cardinality constraint with 0 rows, which
can be used later to eliminate parts of a tree.
(file OptLogRelExpr.cpp)
- [TRAFODION-2280] Need to remove salt columns from uniqueness
constraints generated on salted tables.
(file OptLogRelExpr.cpp)
- Got rid of the now meaningless "seamonster" display in EXPLAIN.
(file GenExplain.cpp and misc. expected files)
- Suppress display of "zombies" in the cstat command. Otherwise,
these zombies (marked as <defunct>) prevent Trafodion from
starting, because they are incorrectly considered "orphan"
processes. This could require a reboot when no reboot is necessary.
(file core/sqf/sql/scripts/pstat)
Incomplete list of things left to be done:
- TRAFODION-2316: Hive temp tables are not secure. Use volatile
tables instead.
- TRAFODION-2315: Add heuristics to decide when to use the temp table
approach.
- TRAFODION-2320: Make subquery unnesting work with common subexpressions.
Generated Plans
---------------
The resulting query plan for a query Q with n common
subexpressions CSE1 ... CSEn looks like this:
Root
|
MapValueIds
|
BlockedUnion
/ \
Union Q
/ \
... CTn
/
Union
/ \
CT1 CT2
Each of the CTi variables looks like the following, an
INSERT OVERWRITE TABLE tempi ...
BlockedUnion
/ \
Truncate FastExtract TEMPi
TEMPi |
CSEi
The original query Q has the common subexpressions replaced
with the following:
MapValueIds
|
scan TEMPi
Here is a simple query and its explain:
prepare s from
with cse1 as (select d_date_sk, d_date, d_year, d_dow, d_moy from date_dim)
select x.d_year, y.d_date
from cse1 x join cse1 y on x.d_date_sk = y.d_date_sk
where x.d_moy = 3;
>>explain options 'f' s;
LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------
11 . 12 root 1.46E+005
5 10 11 blocked_union 1.46E+005
7 9 10 merge_join 7.30E+004
8 . 9 sort 1.00E+002
. . 8 hive_scan CSE_TEMP_CSE1_MXID11 1.00E+002
6 . 7 sort 5.00E+001
. . 6 hive_scan CSE_TEMP_CSE1_MXID11 5.00E+001
1 4 5 blocked_union 7.30E+004
2 . 4 hive_insert CSE_TEMP_CSE1_MXID11 7.30E+004
. . 2 hive_scan DATE_DIM 7.30E+004
. . 1 hive_truncate 1.00E+000
--- SQL operation complete.
>>
CQDs to control common subexpressions
-------------------------------------
CSE_FOR_WITH is the master switch.
CQD Value Default Behavior
--------------------- --------- -------
---------------------------------------
CSE_FOR_WITH OFF Y No change
ON Insert a CommonSubExprRef node in the
tree whenever we reference a CTE
(table defined in a WITH clause)
CSE_USE_TEMP OFF Y Disable creation of temp tables
for common subexpressions
SYSTEM Same as OFF for now
ON Always create a temp table for
common subexpressions where possible
CSE_DEBUG_WARNINGS OFF Y No change
ON Emit diagnostic warnings that show
why
we didn't create temp tables for
common subexpressions
CSE_CLEANUP_HIVE_TABLES OFF Y No change
ON Cleanup Hive tables used for CSEs
with
the "cleanup obsolete volatile
tables"
command
CommonSubExprRef relational operators
-------------------------------------
This is a new RelExpr class that is introduced. It marks the common
subexpressions in a RelExpr tree. This operator remembers the name of
a common subexpression (e.g. the name used in the WITH clause).
Multiple such operators can reference to the same name. Each of
these references has a copy of the tree.
Right now, these operators are created in the parser when we expand a
CTE (Common Table Expression), declared in a WITH clause. If the CTE
is referenced only once, then the CommonSubExprRef operator is removed
in the binder - it also doesn't live up to its name in this case.
The remaining CommonSubExprRef operators keep track of various changes
to their child trees, during the binder and normalizer phases. In
particular, it tracks which predicates are pushed down into the child
tree and which outputs are eliminated.
The CmpStatement object keeps a global list of all the
CommonSubExprRef operators in a statement, so the individual operators
have a way to communicate with their siblings:
- A statement can have zero or more named common subexpressions.
- Each reference to a common subexpression is marked in the RelExpr
tree with a CommonSubExprRef node.
- In the binder and normalizer, common subexpressions are expanded,
meaning that multiple copies of them exist, one copy per
CommonSubExprRef.
- Common subexpressions can reference other common subexpressions,
so they, together with the main query, for a DAG (directed
acyclic graph) of dependencies.
- Note that CTEs declared in a WITH clause but not referenced are
ignored and are not part of the query tree.
In the semantic query optimization phase (SQO), the current code makes
a heuristic decision what to do with common subexpressions - to
evaluate them multiple times (expand) or to create a temporary table
once and read that table multiple times.
If we decide to expand, the action is simple: Remove the
CommonSubExprRef operator from the tree and replace it with its child.
If we decide to create a temp table, things become much more difficult.
We need to do several steps:
- Pick one of the child trees of the CommonSubExprRefs as the one to
materialize.
- Undo any normalization steps that aren't compatible with the other
CommonSubExprRefs. That means pulling out predicates that are not
common among the references and adding back outputs that are
required by other references. If that process fails, we fall back
and expand the expressions.
- Create a temp table tmp.
- Prepare an INSERT OVERWRITE TABLE tmp SELECT * FROM cse tree
that materializes the common subexpression in a table.
- Replace the CommonSubExprRef nodes with scans of the temp table.
- Hook up the insert query tree with the main query, such that it
is executed before the main query starts.
Temporary tables
----------------
At this time, temporary tables are created as Hive tables, with a
fabricated, unique name, including the session id, a unique statement
number within the session, and a unique identifier of the common
subexpression within the statement. The temporary table is created at
compile time. The query plan contains an operator to truncate the
table before populating it. The "temporary" Hive table is dropped when
the executor TCB is deallocated.
Several issues are remaining with this approach:
- If the process exits before executing and deallocating the statement,
the Hive table is not cleaned up.
Solution (TBD): Clean up these tables like we clean up left-over
volatile tables. Both are identified by the session id.
- If the executor runs into memory pressure and deallocates the TCB,
then allocates it again at a later time, the temp table is no longer
there.
Solution (TBD): Use AQR to recompile the query and create a new table.
- Query cache: If we cache a query, multiple queries may end up with
the same temporary table. This works ok as long as these queries are
executed serially, but it fails if both queries are executed at the
same time (e.g. open two cursors and fetch from both, alternating).
Solution (TBD): Add a CQD that disables caching queries with temp tables
for common subexpressions.
In the future we also want to support volatile tables. However, those also
have issues:
- Volatile tables aren't cleaned up until the session ends. If we run
many statements with common subexpressions, that is undesirable. So,
we have a similar cleanup problem as with Hive tables.
- Volatile tables take a relatively long time to create.
- Insert and scan rates on volatile Trafodion tables are slower than
those on Hive tables.
To-do items are marked with "Todo: CSE: " in the code.
Project: http://git-wip-us.apache.org/repos/asf/incubator-trafodion/repo
Commit:
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/commit/b90dc334
Tree: http://git-wip-us.apache.org/repos/asf/incubator-trafodion/tree/b90dc334
Diff: http://git-wip-us.apache.org/repos/asf/incubator-trafodion/diff/b90dc334
Branch: refs/heads/master
Commit: b90dc334e42fd02f12dc3401fedb5a3c1b636093
Parents: 9585cf0
Author: Hans Zeller <[email protected]>
Authored: Thu Oct 27 19:50:20 2016 +0000
Committer: Hans Zeller <[email protected]>
Committed: Thu Oct 27 19:50:20 2016 +0000
----------------------------------------------------------------------
core/sqf/sql/scripts/pstat | 2 +-
core/sql/arkcmp/CmpContext.cpp | 61 +-
core/sql/arkcmp/CmpContext.h | 13 +-
core/sql/arkcmp/CmpSqlSession.h | 2 +-
core/sql/arkcmp/CmpStatement.cpp | 39 +
core/sql/arkcmp/CmpStatement.h | 33 +-
core/sql/bin/SqlciErrors.txt | 4 +
core/sql/comexe/ComTdbExeUtil.cpp | 15 +
core/sql/comexe/ComTdbExeUtil.h | 36 +-
core/sql/comexe/ComTdbHdfsScan.h | 8 +-
core/sql/common/ComSmallDefs.h | 4 +-
core/sql/common/ComSqlId.cpp | 4 +-
core/sql/common/ComSqlId.h | 4 +-
core/sql/common/NAType.cpp | 10 +-
core/sql/common/NAType.h | 4 +-
core/sql/common/OperTypeEnum.h | 2 +
core/sql/executor/ExExeUtil.h | 9 +-
core/sql/executor/ExExeUtilMisc.cpp | 18 +
core/sql/executor/ExExeUtilVolTab.cpp | 110 +-
core/sql/executor/ExHdfsScan.cpp | 170 +-
core/sql/executor/ExHdfsScan.h | 7 +
core/sql/generator/GenExplain.cpp | 4 -
core/sql/generator/GenPreCode.cpp | 121 +-
core/sql/generator/GenRelExeUtil.cpp | 31 +-
core/sql/generator/GenRelMisc.cpp | 11 +
core/sql/generator/GenRelScan.cpp | 24 +-
core/sql/generator/Generator.cpp | 2 +
core/sql/generator/Generator.h | 19 +-
core/sql/nskgmake/arkcmplib/Makefile | 4 -
core/sql/optimizer/BindRelExpr.cpp | 132 +-
core/sql/optimizer/BindWA.cpp | 1 +
core/sql/optimizer/BindWA.h | 9 +-
core/sql/optimizer/CacheWA.h | 2 +-
core/sql/optimizer/GroupAttr.cpp | 5 +-
core/sql/optimizer/GroupAttr.h | 2 +
core/sql/optimizer/HDFSHook.cpp | 35 +-
core/sql/optimizer/HDFSHook.h | 13 +-
core/sql/optimizer/ItemColRef.h | 1 +
core/sql/optimizer/ItemExpr.cpp | 45 +-
core/sql/optimizer/MVCandidates.cpp | 1 -
core/sql/optimizer/NormRelExpr.cpp | 1359 +++++++++++++++-
core/sql/optimizer/NormWA.h | 10 +-
core/sql/optimizer/ObjectNames.cpp | 7 +
core/sql/optimizer/ObjectNames.h | 1 +
core/sql/optimizer/OptLogRelExpr.cpp | 94 +-
core/sql/optimizer/OptimizerSimulator.cpp | 36 +-
core/sql/optimizer/OptimizerSimulator.h | 1 -
core/sql/optimizer/RelExeUtil.cpp | 8 +-
core/sql/optimizer/RelExeUtil.h | 13 +-
core/sql/optimizer/RelExpr.cpp | 331 +++-
core/sql/optimizer/RelExpr.h | 44 +
core/sql/optimizer/RelFastTransport.cpp | 115 ++
core/sql/optimizer/RelFastTransport.h | 33 +-
core/sql/optimizer/RelGrby.h | 8 +
core/sql/optimizer/RelJoin.h | 8 +
core/sql/optimizer/RelMisc.h | 354 +++-
core/sql/optimizer/RelScan.h | 46 +-
core/sql/optimizer/RelSet.h | 9 +
core/sql/optimizer/TableDesc.cpp | 71 +-
core/sql/optimizer/TableDesc.h | 11 +
core/sql/optimizer/TransRule.cpp | 55 +-
core/sql/optimizer/TransRule.h | 19 +
core/sql/optimizer/ValueDesc.cpp | 171 +-
core/sql/optimizer/ValueDesc.h | 17 +-
core/sql/parser/sqlparser.y | 59 +-
core/sql/regress/compGeneral/EXPECTED005 | 6 -
core/sql/regress/compGeneral/EXPECTED045 | 1606 +++++++++++++++++++
core/sql/regress/compGeneral/TEST005 | 2 -
core/sql/regress/compGeneral/TEST045 | 1284 +++++++++++++++
core/sql/regress/core/EXPECTED004.SB | 66 +-
core/sql/regress/seabase/EXPECTED016 | 14 -
core/sql/regress/seabase/EXPECTED018 | 6 +-
core/sql/regress/tools/runregr_compGeneral.ksh | 2 +-
core/sql/sqlcomp/CmpDescribe.cpp | 6 +-
core/sql/sqlcomp/CmpMain.cpp | 6 +-
core/sql/sqlcomp/CmpSeabaseDDLtable.cpp | 6 +-
core/sql/sqlcomp/DefaultConstants.h | 9 +
core/sql/sqlcomp/nadefaults.cpp | 18 +-
core/sql/sqlcomp/parser.cpp | 3 +
79 files changed, 6495 insertions(+), 436 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sqf/sql/scripts/pstat
----------------------------------------------------------------------
diff --git a/core/sqf/sql/scripts/pstat b/core/sqf/sql/scripts/pstat
index 9e7ac04..95daa3c 100755
--- a/core/sqf/sql/scripts/pstat
+++ b/core/sqf/sql/scripts/pstat
@@ -107,4 +107,4 @@ fi
-ps --sort=cmd,pid -C $PROGS -o user:12,pid,ppid,wchan,rss,vsz,time,stat,cmd |
grep -w ^$USER
+ps --sort=cmd,pid -C $PROGS -o user:12,pid,ppid,wchan,rss,vsz,time,stat,cmd |
grep -w ^$USER | grep -v '<defunct>'$
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/arkcmp/CmpContext.cpp
----------------------------------------------------------------------
diff --git a/core/sql/arkcmp/CmpContext.cpp b/core/sql/arkcmp/CmpContext.cpp
index a256db0..c7d26f2 100644
--- a/core/sql/arkcmp/CmpContext.cpp
+++ b/core/sql/arkcmp/CmpContext.cpp
@@ -73,6 +73,7 @@
#include "hs_globals.h"
#include "PCodeExprCache.h"
+#include "HBaseClient_JNI.h"
#ifdef NA_CMPDLL
#include "CompException.h"
#include "CostMethod.h"
@@ -144,7 +145,9 @@ CmpContext::CmpContext(UInt32 f, CollHeap * h)
invocationInfos_(h),
planInfos_(h),
routineHandles_(h),
- ddlObjs_(h)
+ ddlObjs_(h),
+ statementNum_(0),
+ hiveClient_(NULL)
{
SetMode(isDynamicSQL() ? STMT_DYNAMIC : STMT_STATIC);
@@ -537,12 +540,65 @@ NABoolean CmpContext::isAuthorizationEnabled( NABoolean
errIfNotReady)
return FALSE;
}
+HiveClient_JNI * CmpContext::getHiveClient(ComDiagsArea *diags)
+{
+ if(NULL == hiveClient_)
+ {
+ hiveClient_ = HiveClient_JNI::getInstance();
+ if ( hiveClient_->isInitialized() == FALSE ||
+ hiveClient_->isConnected() == FALSE)
+ {
+ HVC_RetCode retCode = hiveClient_->init();
+ if (retCode != HVC_OK)
+ {
+ hiveClient_ = NULL;
+ }
+ }
+ }
+
+ if (hiveClient_ == NULL && diags)
+ *diags << DgSqlCode(-1213);
+
+ return hiveClient_;
+}
+
+NABoolean CmpContext::execHiveSQL(const char* hiveSQL, ComDiagsArea *diags)
+{
+ NABoolean result = FALSE;
+
+ if (!hiveClient_)
+ getHiveClient(diags);
+
+ if (hiveClient_)
+ {
+ HVC_RetCode retcode = hiveClient_->executeHiveSQL(hiveSQL);
+
+ switch (retcode)
+ {
+ case HVC_OK:
+ result = TRUE;
+ break;
+
+ default:
+ result = FALSE;
+ }
+
+ if (!result && diags)
+ *diags << DgSqlCode(-1214)
+ << DgString0(GetCliGlobals()->getJniErrorStrPtr())
+ << DgString1(hiveSQL);
+ }
+
+ return result;
+}
+
// -----------------------------------------------------------------------
// The CmpStatement related methods
// -----------------------------------------------------------------------
void CmpContext::setStatement(CmpStatement* s)
{
+ init();
statements_.insert(s);
s->setPrvCmpStatement(statements_[currentStatement_]);
#pragma nowarn(1506) // warning elimination
@@ -582,12 +638,13 @@ void CmpContext::setCurrentStatement(CmpStatement* s)
// diags()->clear();
}
-// Method to initialize the context at the beginning of statement: **A NO-OP**
+// Method to initialize the context at the beginning of statement
void CmpContext::init()
{
// initSchemaDB(); -- This was done in the ctor.
// diags()->clear(); -- This loses any initialization errors;
// -- clear() is done in unsetStatement above.
+ statementNum_++;
}
// -----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/arkcmp/CmpContext.h
----------------------------------------------------------------------
diff --git a/core/sql/arkcmp/CmpContext.h b/core/sql/arkcmp/CmpContext.h
index 37d2df7..c372a00 100644
--- a/core/sql/arkcmp/CmpContext.h
+++ b/core/sql/arkcmp/CmpContext.h
@@ -87,6 +87,7 @@ class OptDefaults;
struct MDDescsInfo;
class CmpStatementISP;
class EstLogProp;
+class HiveClient_JNI;
typedef IntrusiveSharedPtr<EstLogProp> EstLogPropSharedPtr;
namespace tmudr {
class UDRInvocationInfo;
@@ -284,6 +285,11 @@ public :
(v ? flags_ |= IS_AUTHORIZATION_READY : flags_ &= ~IS_AUTHORIZATION_READY);
}
+ UInt32 getStatementNum() const { return statementNum_; }
+
+ HiveClient_JNI *getHiveClient(ComDiagsArea *diags = NULL);
+ NABoolean execHiveSQL(const char* hiveSQL, ComDiagsArea *diags = NULL);
+
// access the NAHeap* for context
NAHeap* statementHeap();
NAHeap* heap() { return heap_; }
@@ -633,7 +639,12 @@ private:
// a transactional begin/commit(rollback) session.
// Used at commit time for NATable cache invalidation.
NAList<DDLObjInfo> ddlObjs_;
-
+
+ // a count of how many statements have been compiled
+ UInt32 statementNum_;
+
+ // for any Hive SQL operations we may want to do
+ HiveClient_JNI* hiveClient_;
}; // end of CmpContext
#pragma warn(1506) // warning elimination
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/arkcmp/CmpSqlSession.h
----------------------------------------------------------------------
diff --git a/core/sql/arkcmp/CmpSqlSession.h b/core/sql/arkcmp/CmpSqlSession.h
index 753a18a..5882c97 100644
--- a/core/sql/arkcmp/CmpSqlSession.h
+++ b/core/sql/arkcmp/CmpSqlSession.h
@@ -71,7 +71,7 @@ public:
void setSessionId(NAString &sessionID);
- NAString getSessionId() {return sessionID_;}
+ const NAString &getSessionId() {return sessionID_;}
void setSessionUsername(NAString &userName);
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/arkcmp/CmpStatement.cpp
----------------------------------------------------------------------
diff --git a/core/sql/arkcmp/CmpStatement.cpp b/core/sql/arkcmp/CmpStatement.cpp
index c351eb8..ede0cb1 100644
--- a/core/sql/arkcmp/CmpStatement.cpp
+++ b/core/sql/arkcmp/CmpStatement.cpp
@@ -86,6 +86,7 @@
#include "opt.h" // to initialize the memo and task_list variables
#include "RelExeUtil.h"
+#include "RelMisc.h"
#include "CmpSeabaseDDL.h"
#include "CmpSeabaseDDLupgrade.h"
#include "NAUserId.h"
@@ -156,7 +157,9 @@ CmpStatement::CmpStatement(CmpContext* context,
isSMDRecompile_ = FALSE;
isParallelLabelOp_ = FALSE;
displayGraph_ = FALSE;
+ cses_ = NULL;
detailsOnRefusedRequirements_ = NULL;
+ numOfCompilationRetries_ = 0;
#ifndef NDEBUG
if ( getenv("ARKCMP_NO_STATEMENTHEAP") )
@@ -1662,6 +1665,19 @@ QueryAnalysis* CmpStatement::initQueryAnalysis()
return queryAnalysis_;
}
+void CmpStatement::prepareForCompilationRetry()
+{
+ // The compiler may retry compiling a statement several times,
+ // sharing the same CmpStatement object. Initialize any data
+ // structures that need it here.
+ numOfCompilationRetries_++;
+
+ if (cses_)
+ cses_->clear();
+ if (detailsOnRefusedRequirements_)
+ detailsOnRefusedRequirements_->clear();
+}
+
void CmpStatement::initCqsWA()
{
cqsWA_ = new (heap_) CqsWA();
@@ -1688,3 +1704,26 @@ void CmpStatement::setTMUDFRefusedRequirements(const
char *details)
detailsOnRefusedRequirements_->insert(new(heap_) NAString(details, heap_));
}
+
+void CmpStatement::addCSEInfo(CSEInfo *info)
+{
+ if (cses_ == NULL)
+ cses_ = new(CmpCommon::statementHeap())
+ LIST(CSEInfo *)(CmpCommon::statementHeap());
+
+ info->setCSEId(cses_->entries());
+ cses_->insert(info);
+}
+
+CSEInfo * CmpStatement::getCSEInfo(const char *cseName)
+{
+ if (cses_)
+ for (CollIndex i=0; i<cses_->entries(); i++)
+ {
+ if ((*cses_)[i]->getName() == cseName)
+ return (*cses_)[i];
+ }
+
+ // no match found
+ return NULL;
+}
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/arkcmp/CmpStatement.h
----------------------------------------------------------------------
diff --git a/core/sql/arkcmp/CmpStatement.h b/core/sql/arkcmp/CmpStatement.h
index bc0660c..0d5b399 100644
--- a/core/sql/arkcmp/CmpStatement.h
+++ b/core/sql/arkcmp/CmpStatement.h
@@ -57,21 +57,24 @@ namespace tmudr {
}
class DDLExpr;
class ExprNode;
-
-// contents
-class CmpStatement;
-class CmpStatementISP;
-class CmpStatementISPGetNext;
class QueryAnalysis;
class CostMethod;
-
class NAMemory;
class CompilationStats;
class OptGlobals;
class CqsWA;
+class CommonSubExprRef;
+class ValueIdList;
+class ValueIdSet;
+class RelExpr;
+class CSEInfo;
-typedef NASimpleArray<NAString*> NAStringList;
+// contents
+class CmpStatement;
+class CmpStatementISP;
+class CmpStatementISPGetNext;
+typedef NASimpleArray<NAString*> NAStringList;
class CmpStatement : public NABasicObject
{
@@ -168,6 +171,9 @@ public:
QueryAnalysis* getQueryAnalysis() { return queryAnalysis_; };
QueryAnalysis* initQueryAnalysis();
+ void prepareForCompilationRetry();
+ Int32 getNumOfCompilationRetries() const { return numOfCompilationRetries_; }
+
// statement shape rewrite
CqsWA* getCqsWA() { return cqsWA_; }
void initCqsWA();
@@ -221,6 +227,10 @@ public:
short getDDLExprAndNode(char * sqlStr, Lng32 inputCS,
DDLExpr* &ddlExpr, ExprNode* &ddlNode);
+ CSEInfo *getCSEInfo(const char *cseName);
+ const LIST(CSEInfo *) *getCSEInfoList() { return cses_; }
+ void addCSEInfo(CSEInfo *info);
+
protected:
// CmpStatement(const CmpStatement&); please remove this line
CmpStatement& operator=(const CmpStatement&);
@@ -315,8 +325,17 @@ private:
// on RelExpr are enabled only when it is set.
NABoolean displayGraph_;
+ // common subexpressions in this statement, there could
+ // be multiple, named CSEs, each with one or more references
+ LIST(CSEInfo *) *cses_;
+
// for error reporting for UDFs, keep a list of requirements the UDF refused
LIST(const NAString *) *detailsOnRefusedRequirements_;
+
+ // indicates whether we are retrying the compile in
+ // CmpMain::sqlcomp(QueryText, ...
+ Int32 numOfCompilationRetries_;
+
}; // end of CmpStatement
class CmpStatementISP: public CmpStatement
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/bin/SqlciErrors.txt
----------------------------------------------------------------------
diff --git a/core/sql/bin/SqlciErrors.txt b/core/sql/bin/SqlciErrors.txt
index 854adf8..e4dda67 100644
--- a/core/sql/bin/SqlciErrors.txt
+++ b/core/sql/bin/SqlciErrors.txt
@@ -212,6 +212,9 @@
1210 ZZZZZ 99999 UUUUUUUU UUUUU UUUUUUU Column $0~String0 is not allowed at
position $1~Int0 in the SPLIT BY clause, because SPLIT BY must specify a prefix
of the clustering key columns and the next clustering key column is $2~String1.
1211 ZZZZZ 99999 UUUUUUUU UUUUU UUUUUUU The column values of the SPLIT BY
first key in position $0~Int0 (one-based) are not greater (for ascending
columns) or less (for descending columns) than its predecessor.
1212 ZZZZZ 99999 UUUUUUUU UUUUU UUUUUUU An HBase region start key provided
only part of the value for Trafodion key column $0~Int0. This could in some
unusual situations cause errors.
+1213 ZZZZZ 99999 UUUUUUUU UUUUU UUUUUUU Unable to create or initialize a
connection to Apache Hive.
+1214 ZZZZZ 99999 UUUUUUUU UUUUU UUUUUUU Error $0~String0 encountered when
executing HiveQL statement $1~String1.
+1215 ZZZZZ 99999 UUUUUUUU UUUUU UUUUUUU An error occurred while determining
host, port, or file name for HDFS URI $0~string0. Cause: $1~string1.
1220 ZZZZZ 99999 BEGINNER MAJOR DBADMIN Code must contain two non-blank
characters.
1221 ZZZZZ 99999 BEGINNER MAJOR DBADMIN Only system components may contain
system operations.
1222 ZZZZZ 99999 BEGINNER MAJOR DBADMIN Command not supported when
authorization is not enabled.
@@ -1440,6 +1443,7 @@ $1~String1 --------------------------------
4492 ZZZZZ 99999 BEGINNER MINOR LOGONLY BULK LOAD option UPDATE STATISTICS
cannot be used with UPSERT USING LOAD option.
4493 ZZZZZ 99999 BEGINNER MINOR LOGONLY Stored Descriptor Status: $0~String0
5000 ZZZZZ 99999 ADVANCED MAJOR DBADMIN Internal error in the query normalizer.
+5001 ZZZZZ 99999 ADVANDED MINOR LOGONLY Common subexpression $0~String0 cannot
be shared among multiple consumers. Reason: $1~String1.
6000 ZZZZZ 99999 ADVANCED MAJOR DBADMIN Internal error in the query optimizer.
6001 0A000 99999 BEGINNER MAJOR DBADMIN DISTINCT aggregates can be computed
for only one column per table expression.
6002 ZZZZZ 99999 BEGINNER MAJOR DBADMIN The metadata table HISTOGRAMS or
HISTOGRAM_INTERVALS contains invalid values. If you have manually modified the
metadata table, then you should undo your changes using the CLEAR option in
UPDATE STATISTICS.
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/comexe/ComTdbExeUtil.cpp
----------------------------------------------------------------------
diff --git a/core/sql/comexe/ComTdbExeUtil.cpp
b/core/sql/comexe/ComTdbExeUtil.cpp
index d45c97d..d4b4c23 100644
--- a/core/sql/comexe/ComTdbExeUtil.cpp
+++ b/core/sql/comexe/ComTdbExeUtil.cpp
@@ -1302,6 +1302,7 @@ void ComTdbExeUtilFastDelete::displayContents(Space *
space,
ComTdbExeUtilHiveTruncate::ComTdbExeUtilHiveTruncate(
char * tableName,
ULng32 tableNameLen,
+ char * hiveTableName,
char * tableLocation,
char * partnLocation,
char * hostName,
@@ -1324,6 +1325,7 @@ ComTdbExeUtilHiveTruncate::ComTdbExeUtilHiveTruncate(
down, up,
num_buffers, buffer_size),
flags_(0),
+ hiveTableName_(hiveTableName),
tableLocation_(tableLocation),
partnLocation_(partnLocation),
hdfsHost_(hostName),
@@ -1344,6 +1346,9 @@ Long ComTdbExeUtilHiveTruncate::pack(void * space)
if (partnLocation_)
partnLocation_.pack(space);
+ if (hiveTableName_)
+ hiveTableName_.pack(space);
+
return ComTdbExeUtil::pack(space);
}
@@ -1358,6 +1363,9 @@ Lng32 ComTdbExeUtilHiveTruncate::unpack(void * base, void
* reallocator)
if (partnLocation_.unpack(base))
return -1;
+ if (hiveTableName_.unpack(base))
+ return -1;
+
return ComTdbExeUtil::unpack(base, reallocator);
}
@@ -1379,6 +1387,13 @@ void ComTdbExeUtilHiveTruncate::displayContents(Space *
space,
sizeof(short));
}
+ if (getHiveTableName() != NULL)
+ {
+ str_sprintf(buf,"Hive Tablename = %s ",getHiveTableName());
+ space->allocateAndCopyToAlignedSpace(buf, str_len(buf),
+ sizeof(short));
+ }
+
if (getTableLocation() != NULL)
{
str_sprintf(buf,"tableLocation_ = %s ", getTableLocation());
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/comexe/ComTdbExeUtil.h
----------------------------------------------------------------------
diff --git a/core/sql/comexe/ComTdbExeUtil.h b/core/sql/comexe/ComTdbExeUtil.h
index 1009f75..61df851 100644
--- a/core/sql/comexe/ComTdbExeUtil.h
+++ b/core/sql/comexe/ComTdbExeUtil.h
@@ -1270,12 +1270,17 @@ public:
void setCleanupAllTables(NABoolean v)
{(v ? flags_ |= CLEANUP_ALL_TABLES : flags_ &= ~CLEANUP_ALL_TABLES); };
NABoolean cleanupAllTables() { return (flags_ & CLEANUP_ALL_TABLES) != 0; };
+ void setCleanupHiveCSETables(NABoolean v)
+ {(v ? flags_ |= CLEANUP_HIVE_CSE_TABLES : flags_ &=
~CLEANUP_HIVE_CSE_TABLES); }
+ NABoolean cleanupHiveCSETables() { return (flags_ & CLEANUP_HIVE_CSE_TABLES)
!= 0; }
private:
enum
{
// cleanup obsolete and active schemas/tables.
- CLEANUP_ALL_TABLES = 0x0001
+ CLEANUP_ALL_TABLES = 0x0001,
+ // cleanup Hive tables used for common subexpressions
+ CLEANUP_HIVE_CSE_TABLES = 0x0002
};
UInt32 flags_; // 00-03
@@ -1620,12 +1625,19 @@ private:
class ComTdbExeUtilHiveTruncate : public ComTdbExeUtil
{
public:
+ // flags
+ enum
+ {
+ TRUNC_DROP_TABLE_ON_DEALLOC = 0x0001
+ };
+
ComTdbExeUtilHiveTruncate()
: ComTdbExeUtil()
{}
ComTdbExeUtilHiveTruncate(char * tableName,
ULng32 tableNameLen,
+ char * hiveTableName,
char * tableLocation,
char * partnLocation,
char * hostName,
@@ -1677,18 +1689,28 @@ public:
return partnLocation_;
}
+ char * getHiveTableName() const
+ {
+ return hiveTableName_;
+ }
+
+ void setDropOnDealloc(NABoolean v)
+ {(v ? flags_ |= TRUNC_DROP_TABLE_ON_DEALLOC : flags_ &=
~TRUNC_DROP_TABLE_ON_DEALLOC); }
+ NABoolean getDropOnDealloc() { return (flags_ & TRUNC_DROP_TABLE_ON_DEALLOC)
!= 0; }
+
// ---------------------------------------------------------------------
// Used by the internal SHOWPLAN command to get attributes of a TDB.
// ---------------------------------------------------------------------
NA_EIDPROC void displayContents(Space *space, ULng32 flag);
private:
- NABasicPtr tableLocation_; // 00-07
- NABasicPtr partnLocation_; // 08-15
- NABasicPtr hdfsHost_; // 16-23
- Int64 modTS_; // 24-31
- Int32 hdfsPort_; // 32-35
- UInt32 flags_; // 36-39
+ NABasicPtr hiveTableName_; // 00-07
+ NABasicPtr tableLocation_; // 08-15
+ NABasicPtr partnLocation_; // 16-23
+ NABasicPtr hdfsHost_; // 24-31
+ Int64 modTS_; // 32-39
+ Int32 hdfsPort_; // 40-43
+ UInt32 flags_; // 44-47
};
class ComTdbExeUtilGetStatistics : public ComTdbExeUtil
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/comexe/ComTdbHdfsScan.h
----------------------------------------------------------------------
diff --git a/core/sql/comexe/ComTdbHdfsScan.h b/core/sql/comexe/ComTdbHdfsScan.h
index 392f7cf..29718b7 100755
--- a/core/sql/comexe/ComTdbHdfsScan.h
+++ b/core/sql/comexe/ComTdbHdfsScan.h
@@ -53,7 +53,8 @@ class ComTdbHdfsScan : public ComTdb
// ignore conversion errors and continue reading the next row.
CONTINUE_ON_ERROR = 0x0020,
- LOG_ERROR_ROWS = 0x0040
+ LOG_ERROR_ROWS = 0x0040,
+ ASSIGN_RANGES_AT_RUNTIME = 0x0080
};
// Expression to filter rows.
@@ -279,6 +280,11 @@ public:
{(v ? flags_ |= LOG_ERROR_ROWS : flags_ &= ~LOG_ERROR_ROWS); };
NABoolean getLogErrorRows() { return (flags_ & LOG_ERROR_ROWS) != 0; };
+ void setAssignRangesAtRuntime(NABoolean v)
+ {(v ? flags_ |= ASSIGN_RANGES_AT_RUNTIME : flags_ &=
~ASSIGN_RANGES_AT_RUNTIME); }
+ NABoolean getAssignRangesAtRuntime() const
+ { return (flags_ & ASSIGN_RANGES_AT_RUNTIME)
!= 0; }
+
UInt32 getMaxErrorRows() const { return maxErrorRows_;}
void setMaxErrorRows(UInt32 v ) { maxErrorRows_= v; }
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/common/ComSmallDefs.h
----------------------------------------------------------------------
diff --git a/core/sql/common/ComSmallDefs.h b/core/sql/common/ComSmallDefs.h
index 981ee5e..10a37c3 100644
--- a/core/sql/common/ComSmallDefs.h
+++ b/core/sql/common/ComSmallDefs.h
@@ -94,9 +94,11 @@ typedef NABoolean ComBoolean;
#define SMD_LOCATION "$SYSTEM"
#define SMD_VERSION "1000"
-//#define COM_SESSION_ID_PREFIX "MXID_"
#define COM_VOLATILE_SCHEMA_PREFIX "VOLATILE_SCHEMA_"
+// prefix of temp tables for common subexpressions
+#define COM_CSE_TABLE_PREFIX "CSE_TEMP_"
+
// 'reserved' tables in public_access_schema for sql internal use
#define COM_PUBLIC_ACCESS_SCHEMA "PUBLIC_ACCESS_SCHEMA"
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/common/ComSqlId.cpp
----------------------------------------------------------------------
diff --git a/core/sql/common/ComSqlId.cpp b/core/sql/common/ComSqlId.cpp
index b0317d8..9be694b 100644
--- a/core/sql/common/ComSqlId.cpp
+++ b/core/sql/common/ComSqlId.cpp
@@ -126,7 +126,7 @@ Lng32 ComSqlId::createSqlQueryId
Lng32 ComSqlId::getSqlIdAttr
(Lng32 attr, // which attr (SqlQueryIDAttr)
- char * queryId, // query ID
+ const char * queryId,// query ID
Lng32 queryIdLen, // query ID len
Int64 &value, // If returned attr is numeric,
// this field contains the returned value.
@@ -445,7 +445,7 @@ Lng32 ComSqlId::getSqlSessionIdAttr
}
Lng32 ComSqlId::extractSqlSessionIdAttrs
-(char * sessionId, // IN
+(const char * sessionId, // IN
Lng32 maxSessionIdLen, // IN
Int64 &segmentNumber, // OUT
Int64 &cpu, // OUT
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/common/ComSqlId.h
----------------------------------------------------------------------
diff --git a/core/sql/common/ComSqlId.h b/core/sql/common/ComSqlId.h
index 44c88f7..80e70ed 100644
--- a/core/sql/common/ComSqlId.h
+++ b/core/sql/common/ComSqlId.h
@@ -240,7 +240,7 @@ NA_EIDPROC
NA_EIDPROC
static Lng32 extractSqlSessionIdAttrs
- (char * sessionId, // IN
+ (const char * sessionId, // IN
Lng32 sessionIdLen, // IN
Int64 &segmentNumber, // OUT
Int64 &cpu, // OUT
@@ -279,7 +279,7 @@ private:
NA_EIDPROC
static Lng32 getSqlIdAttr
(Lng32 attr, // which attr (SqlQueryIDAttr)
- char * queryId, // query ID
+ const char * queryId,// query ID
Lng32 queryIdLen, // query ID len.
Int64 &value, // If returned attr is of string type, this value is the
// max length of the buffer pointed to by stringValue.
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/common/NAType.cpp
----------------------------------------------------------------------
diff --git a/core/sql/common/NAType.cpp b/core/sql/common/NAType.cpp
index 1bbe1b5..55e91d7 100644
--- a/core/sql/common/NAType.cpp
+++ b/core/sql/common/NAType.cpp
@@ -679,7 +679,7 @@ short NAType::convertTypeToText(char * text, //
OUTPUT
displayCaseSpecific);
}
-short NAType::getMyTypeAsHiveText(NAString * outputStr) // output
+short NAType::getMyTypeAsHiveText(NAString * outputStr/*out*/) const
{
Lng32 fs_datatype = getFSDatatype();
@@ -725,6 +725,12 @@ short NAType::getMyTypeAsHiveText(NAString * outputStr)
// output
*outputStr = "bigint";
break;
+ case REC_MIN_DECIMAL ... REC_MAX_DECIMAL:
+ outputStr->format("decimal(%d,%d)",
+ getPrecision(),
+ getScale());
+ break;
+
case REC_FLOAT32:
*outputStr = "float";
break;
@@ -762,7 +768,7 @@ short NAType::getMyTypeAsHiveText(NAString * outputStr) //
output
}
short NAType::getMyTypeAsText(NAString * outputStr, // output
- NABoolean addNullability)
+ NABoolean addNullability) const
{
// get the right value for all these
Lng32 fs_datatype = getFSDatatype();
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/common/NAType.h
----------------------------------------------------------------------
diff --git a/core/sql/common/NAType.h b/core/sql/common/NAType.h
index b664485..dd81188 100644
--- a/core/sql/common/NAType.h
+++ b/core/sql/common/NAType.h
@@ -560,9 +560,9 @@ public:
short displayCaseSpecific = 0);
short getMyTypeAsText(NAString * outputStr,
- NABoolean addNullability = TRUE); // output
+ NABoolean addNullability = TRUE) const; // output
- short getMyTypeAsHiveText(NAString * outputStr); // output
+ short getMyTypeAsHiveText(NAString * outputStr) const; // output
// used for query caching
Lng32 getSize() const;
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/common/OperTypeEnum.h
----------------------------------------------------------------------
diff --git a/core/sql/common/OperTypeEnum.h b/core/sql/common/OperTypeEnum.h
index 6f66760..bc76f6d 100644
--- a/core/sql/common/OperTypeEnum.h
+++ b/core/sql/common/OperTypeEnum.h
@@ -282,6 +282,8 @@ enum OperatorTypeEnum {
REL_HIVE_INSERT,
REL_BULK_UNLOAD,
+ REL_COMMON_SUBEXPR_REF,
+
REL_LAST_REL_OP = 1999,
// item operators (predicates)
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/executor/ExExeUtil.h
----------------------------------------------------------------------
diff --git a/core/sql/executor/ExExeUtil.h b/core/sql/executor/ExExeUtil.h
index b6b96fc..f3fccb8 100755
--- a/core/sql/executor/ExExeUtil.h
+++ b/core/sql/executor/ExExeUtil.h
@@ -196,7 +196,7 @@ class ExExeUtilTcb : public ex_tcb
ex_queue_pair getParentQueue() const;
Int32 orderedQueueProtocol() const;
- void freeResources();
+ virtual void freeResources();
virtual Int32 numChildren() const;
virtual const ex_tcb* getChild(Int32 pos) const;
@@ -989,7 +989,9 @@ class ExExeUtilVolatileTablesTcb : public ExExeUtilTcb
Lng32 &pstateLength); // out,
length of one element
protected:
- short isCreatorProcessObsolete(char * schemaName, NABoolean includesCat);
+ short isCreatorProcessObsolete(const char * name,
+ NABoolean includesCat,
+ NABoolean isCSETableName);
};
class ExExeUtilCleanupVolatileTablesTcb : public ExExeUtilVolatileTablesTcb
@@ -1015,6 +1017,7 @@ class ExExeUtilCleanupVolatileTablesTcb : public
ExExeUtilVolatileTablesTcb
ex_globals *globals = NULL,
ComDiagsArea * diagsArea = NULL);
static short dropVolatileTables(ContextCli * currContext, CollHeap * heap);
+ short dropHiveTempTablesForCSEs(ComDiagsArea * diagsArea = NULL);
private:
enum Step
@@ -1027,6 +1030,7 @@ class ExExeUtilCleanupVolatileTablesTcb : public
ExExeUtilVolatileTablesTcb
DO_CLEANUP_,
COMMIT_WORK_,
END_CLEANUP_,
+ CLEANUP_HIVE_TABLES_,
DONE_,
ERROR_
};
@@ -3343,6 +3347,7 @@ class ExExeUtilHiveTruncateTcb : public ExExeUtilTcb
~ExExeUtilHiveTruncateTcb();
+ virtual void freeResources();
virtual short work();
NA_EIDPROC virtual ex_tcb_private_state * allocatePstates(
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/executor/ExExeUtilMisc.cpp
----------------------------------------------------------------------
diff --git a/core/sql/executor/ExExeUtilMisc.cpp
b/core/sql/executor/ExExeUtilMisc.cpp
index 440c83b..2d6e0c7 100644
--- a/core/sql/executor/ExExeUtilMisc.cpp
+++ b/core/sql/executor/ExExeUtilMisc.cpp
@@ -2252,6 +2252,24 @@ ExExeUtilHiveTruncateTcb::ExExeUtilHiveTruncateTcb(
ExExeUtilHiveTruncateTcb::~ExExeUtilHiveTruncateTcb()
{
+ freeResources();
+}
+
+void ExExeUtilHiveTruncateTcb::freeResources()
+{
+ if (htTdb().getDropOnDealloc())
+ {
+ NAString hiveDropDDL("drop table ");
+ HiveClient_JNI *hiveClient = HiveClient_JNI::getInstance();
+
+ hiveDropDDL += htTdb().getHiveTableName();
+
+ // ignore errors on drop
+ if (!hiveClient->isInitialized() ||
+ !hiveClient->isConnected())
+ hiveClient->init();
+ hiveClient->executeHiveSQL(hiveDropDDL);
+ }
}
Int32 ExExeUtilHiveTruncateTcb::fixup()
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/executor/ExExeUtilVolTab.cpp
----------------------------------------------------------------------
diff --git a/core/sql/executor/ExExeUtilVolTab.cpp
b/core/sql/executor/ExExeUtilVolTab.cpp
index 4da869e..9201374 100644
--- a/core/sql/executor/ExExeUtilVolTab.cpp
+++ b/core/sql/executor/ExExeUtilVolTab.cpp
@@ -53,6 +53,7 @@
#include "ComRtUtils.h"
#include "ExStats.h"
#include "seabed/ms.h"
+#include "CmpContext.h"
///////////////////////////////////////////////////////////////////
ex_tcb * ExExeUtilLoadVolatileTableTdb::build(ex_globals * glob)
@@ -285,7 +286,7 @@ ExExeUtilVolatileTablesTcb::ExExeUtilVolatileTablesTcb(
}
short ExExeUtilVolatileTablesTcb::isCreatorProcessObsolete
-(char * schemaName, NABoolean includesCat)
+(const char * name, NABoolean includesCat, NABoolean isCSETableName)
{
Lng32 retcode = 0;
@@ -293,21 +294,39 @@ short ExExeUtilVolatileTablesTcb::isCreatorProcessObsolete
short segmentNum;
short cpu;
pid_t pin;
- Int64 schemaNameCreateTime = 0;
+ Int64 nameCreateTime = 0;
Lng32 currPos = 0;
if (includesCat)
{
- // schemaName is of the form: <CAT>.<SCHEMA>
+ // name is of the form: <CAT>.<SCHEMA>
// Skip the <CAT> part.
- while (schemaName[currPos] != '.')
+ while (name[currPos] != '.')
currPos++;
currPos++;
}
- currPos +=
- strlen(COM_VOLATILE_SCHEMA_PREFIX); // + strlen(COM_SESSION_ID_PREFIX);
+ if (isCSETableName)
+ {
+ // CSE table names look like this: CSE_TEMP_<name>_MXID..._Snnn_mmm
+ const char *startPrefix = "_" COM_SESSION_ID_PREFIX;
+ const char *match = &name[currPos];
+ const char *prevMatch = NULL;
+
+ // find the last occurrence of the start prefix in the name
+ while ((match = strstr(match, startPrefix)) != NULL)
+ prevMatch = ++match; // position prevMatch on the "MXID"
+
+ if (prevMatch)
+ currPos = prevMatch-name;
+ else
+ return 0; // name does not fit our pattern, don't delete it
+ }
+ else
+ // volatile table schema is a fixed prefix, followed by the session id
+ currPos +=
+ strlen(COM_VOLATILE_SCHEMA_PREFIX);
Int64 segmentNum_l;
Int64 cpu_l;
@@ -316,12 +335,12 @@ short ExExeUtilVolatileTablesTcb::isCreatorProcessObsolete
Lng32 userNameLen = 0;
Lng32 userSessionNameLen = 0;
ComSqlId::extractSqlSessionIdAttrs
- (&schemaName[currPos],
- -1, //(strlen(schemaName) - currPos),
+ (&name[currPos],
+ -1, //(strlen(name) - currPos),
segmentNum_l,
cpu_l,
pin_l,
- schemaNameCreateTime,
+ nameCreateTime,
sessionUniqNum,
userNameLen, NULL,
userSessionNameLen, NULL);
@@ -330,7 +349,7 @@ short ExExeUtilVolatileTablesTcb::isCreatorProcessObsolete
pin = (pid_t)pin_l;
// see if process exists. If it exists, check if it is the same
- // process that is specified in the schemaName.
+ // process that is specified in the name.
short errorDetail = 0;
Int64 procCreateTime = 0;
retcode = ComRtGetProcessCreateTime(&cpu, &pin, &segmentNum,
@@ -338,12 +357,12 @@ short ExExeUtilVolatileTablesTcb::isCreatorProcessObsolete
errorDetail);
if (retcode == XZFIL_ERR_OK)
{
- // process specified in schema name exists.
- if (schemaNameCreateTime != procCreateTime)
- // but is a different process. Schema's process is obsolete.
+ // process specified in name exists.
+ if (nameCreateTime != procCreateTime)
+ // but is a different process. Schema or name's process is obsolete.
return -1;
else
- // schema's process is still alive.
+ // schema or name's process is still alive.
return 0;
}
else
@@ -520,7 +539,7 @@ short ExExeUtilCleanupVolatileTablesTcb::work()
OutputInfo * vi = (OutputInfo*)schemaNamesList_->getCurr();
char * schemaName = vi->get(0);
if ((cvtTdb().cleanupAllTables()) ||
- (isCreatorProcessObsolete(schemaName, FALSE)))
+ (isCreatorProcessObsolete(schemaName, FALSE, FALSE)))
{
// schema is obsolete, drop it.
// Or we need to cleanup all schemas, active or obsolete.
@@ -579,6 +598,15 @@ short ExExeUtilCleanupVolatileTablesTcb::work()
*diags << DgSqlCode(1069)
<< DgSchemaName(errorSchemas_);
}
+ step_ = CLEANUP_HIVE_TABLES_;
+ }
+ break;
+
+ case CLEANUP_HIVE_TABLES_:
+ {
+ if (cvtTdb().cleanupHiveCSETables())
+ dropHiveTempTablesForCSEs(getDiagsArea());
+
step_ = DONE_;
}
break;
@@ -708,6 +736,54 @@ short ExExeUtilCleanupVolatileTablesTcb::dropVolatileTables
return cliRC;
}
+short ExExeUtilCleanupVolatileTablesTcb::dropHiveTempTablesForCSEs(
+ ComDiagsArea * diagsArea)
+{
+ Queue * hiveTableNames = NULL;
+ // Todo: CSE: support schemas other than default for temp tables
+ NAString hiveTablesGetQuery("get tables in schema hive.hive, no header");
+ short retcode = 0;
+
+ if (initializeInfoList(hiveTableNames))
+ {
+ return -1;
+ }
+
+ if (fetchAllRows(hiveTableNames,
+ (char *) hiveTablesGetQuery.data(),
+ 1,
+ FALSE,
+ retcode) < 0)
+ {
+ return -1;
+ }
+
+ hiveTableNames->position();
+
+ while (!hiveTableNames->atEnd())
+ {
+ OutputInfo * ht = (OutputInfo*) (hiveTableNames->getCurr());
+ const char *origTableName = ht->get(0);
+ NAString tableName(origTableName);
+
+ tableName.toUpper();
+
+ if (strstr(tableName.data(), COM_CSE_TABLE_PREFIX) == tableName.data() &&
+ isCreatorProcessObsolete(tableName.data(), FALSE, TRUE))
+ {
+ NAString dropHiveTable("drop table ");
+
+ dropHiveTable += origTableName;
+ if (!CmpCommon::context()->execHiveSQL(dropHiveTable.data()))
+ ; // ignore errors for now
+ }
+
+ hiveTableNames->advance();
+ }
+
+ return 0;
+}
+
///////////////////////////////////////////////////////////////////
// class ExExeUtilGetVolatileInfoTdb
///////////////////////////////////////////////////////////////
@@ -914,7 +990,7 @@ short ExExeUtilGetVolatileInfoTcb::work()
char * schemaName = vi->get(0);
char state[10];
- if (isCreatorProcessObsolete(schemaName, FALSE))
+ if (isCreatorProcessObsolete(schemaName, FALSE, FALSE))
strcpy(state, "Obsolete");
else
strcpy(state, "Active ");
@@ -947,7 +1023,7 @@ short ExExeUtilGetVolatileInfoTcb::work()
(strcmp(prevInfo_->get(0), schemaName) != 0))
{
char state[10];
- if (isCreatorProcessObsolete(schemaName, FALSE))
+ if (isCreatorProcessObsolete(schemaName, FALSE, FALSE))
strcpy(state, "Obsolete");
else
strcpy(state, "Active ");
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/executor/ExHdfsScan.cpp
----------------------------------------------------------------------
diff --git a/core/sql/executor/ExHdfsScan.cpp b/core/sql/executor/ExHdfsScan.cpp
index 0552f6c..accc695 100644
--- a/core/sql/executor/ExHdfsScan.cpp
+++ b/core/sql/executor/ExHdfsScan.cpp
@@ -117,6 +117,8 @@ ExHdfsScanTcb::ExHdfsScanTcb(
, checkRangeDelimiter_(FALSE)
, dataModCheckDone_(FALSE)
, loggingErrorDiags_(NULL)
+ , runTimeRanges_(NULL)
+ , numRunTimeRanges_(0)
{
Space * space = (glob ? glob->getSpace() : 0);
CollHeap * heap = (glob ? glob->getDefaultHeap() : 0);
@@ -255,6 +257,8 @@ void ExHdfsScanTcb::freeResources()
delete qparent_.down;
qparent_.down = NULL;
}
+ if (runTimeRanges_)
+ deallocateRuntimeRanges();
ExpLOBinterfaceCleanup
(lobGlob_, getGlobals()->getDefaultHeap());
@@ -404,14 +408,20 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
dataModCheckDone_ = FALSE;
- if (hdfsScanTdb().getHdfsFileInfoList()->isEmpty())
+ myInstNum_ = getGlobals()->getMyInstanceNumber();
+ hdfsScanBufMaxSize_ = hdfsScanTdb().hdfsBufSize_;
+
+ if (hdfsScanTdb().getAssignRangesAtRuntime())
+ {
+ step_ = ASSIGN_RANGES_AT_RUNTIME;
+ break;
+ }
+ else if (hdfsScanTdb().getHdfsFileInfoList()->isEmpty())
{
step_ = CHECK_FOR_DATA_MOD_AND_DONE;
break;
}
- myInstNum_ = getGlobals()->getMyInstanceNumber();
-
beginRangeNum_ =
*(Lng32*)hdfsScanTdb().getHdfsFileRangeBeginList()->get(myInstNum_);
@@ -420,8 +430,6 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
currRangeNum_ = beginRangeNum_;
- hdfsScanBufMaxSize_ = hdfsScanTdb().hdfsBufSize_;
-
if (numRanges_ > 0)
step_ = CHECK_FOR_DATA_MOD;
else
@@ -429,6 +437,15 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
}
break;
+ case ASSIGN_RANGES_AT_RUNTIME:
+ computeRangesAtRuntime();
+ currRangeNum_ = beginRangeNum_;
+ if (numRanges_ > 0)
+ step_ = INIT_HDFS_CURSOR;
+ else
+ step_ = DONE;
+ break;
+
case CHECK_FOR_DATA_MOD:
case CHECK_FOR_DATA_MOD_AND_DONE:
{
@@ -506,8 +523,7 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
case INIT_HDFS_CURSOR:
{
- hdfo_ = (HdfsFileInfo*)
- hdfsScanTdb().getHdfsFileInfoList()->get(currRangeNum_);
+ hdfo_ = getRange(currRangeNum_);
if ((hdfo_->getBytesToRead() == 0) &&
(beginRangeNum_ == currRangeNum_) && (numRanges_ > 1))
{
@@ -516,8 +532,7 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
// since the file may neeed to be closed. The first
// range being 0 is common with sqoop generated files
currRangeNum_++;
- hdfo_ = (HdfsFileInfo*)
- hdfsScanTdb().getHdfsFileInfoList()->get(currRangeNum_);
+ hdfo_ = getRange(currRangeNum_);
}
hdfsOffset_ = hdfo_->getStartOffset();
@@ -615,8 +630,7 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
// preopen next range.
if ( (currRangeNum_ + 1) < (beginRangeNum_ + numRanges_) )
{
- hdfo = (HdfsFileInfo*)
- hdfsScanTdb().getHdfsFileInfoList()->get(currRangeNum_ +
1);
+ hdfo = getRange(currRangeNum_ + 1);
hdfsFileName_ = hdfo->fileName();
sprintf(cursorId, "%d", currRangeNum_ + 1);
@@ -1428,7 +1442,7 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
((currRangeNum_ + 1) < (beginRangeNum_ + numRanges_)))
{
- hdfo = (HdfsFileInfo*)
hdfsScanTdb().getHdfsFileInfoList()->get(currRangeNum_ + 1);
+ hdfo = getRange(currRangeNum_ + 1);
if (strcmp(hdfsFileName_, hdfo->fileName()) == 0)
closeFile = false;
}
@@ -1700,6 +1714,138 @@ char *
ExHdfsScanTcb::extractAndTransformAsciiSourceToSqlRow(int &err,
return NULL;
}
+void ExHdfsScanTcb::computeRangesAtRuntime()
+{
+ int numFiles = 0;
+ Int64 totalSize = 0;
+ Int64 myShare = 0;
+ Int64 runningSum = 0;
+ Int64 myStartPositionInBytes = 0;
+ Int64 firstFileStartingOffset = 0;
+ Int64 lastFileBytesToRead = -1;
+ Int32 numParallelInstances = MAXOF(getGlobals()->getNumOfInstances(),1);
+ hdfsFS fs = ((GetCliGlobals()->currContext())->getHdfsServerConnection(
+ hdfsScanTdb().hostName_,
+ hdfsScanTdb().port_));
+ hdfsFileInfo *fileInfos = hdfsListDirectory(fs,
+ hdfsScanTdb().hdfsRootDir_,
+ &numFiles);
+
+ if (runTimeRanges_)
+ deallocateRuntimeRanges();
+
+ // in a first round, count the total number of bytes
+ for (int f=0; f<numFiles; f++)
+ {
+ ex_assert(fileInfos[f].mKind == kObjectKindFile,
+ "subdirectories not supported with runtime HDFS ranges");
+ totalSize += (Int64) fileInfos[f].mSize;
+ }
+
+ // compute my share, in bytes
+ // (the last of the ESPs may read a bit more)
+ myShare = totalSize / numParallelInstances;
+ myStartPositionInBytes = myInstNum_ * myShare;
+ beginRangeNum_ = -1;
+ numRanges_ = 0;
+
+ if (totalSize > 0)
+ {
+ // second round, find out the range of files I need to read
+ for (int g=0; g<numFiles; g++)
+ {
+ Int64 prevSum = runningSum;
+
+ runningSum += (Int64) fileInfos[g].mSize;
+
+ if (runningSum >= myStartPositionInBytes)
+ {
+ if (beginRangeNum_ < 0)
+ {
+ // I have reached the first file that I need to read
+ beginRangeNum_ = g;
+ firstFileStartingOffset =
+ myStartPositionInBytes - prevSum;
+ }
+
+ numRanges_++;
+
+ if (runningSum > (myStartPositionInBytes + myShare) &&
+ myInstNum_ < numParallelInstances-1)
+ // the next file is beyond the range that I need to read
+ lastFileBytesToRead =
+ myStartPositionInBytes + myShare - prevSum;
+ break;
+ }
+ }
+
+ // now that we now how many ranges we need, allocate them
+ numRunTimeRanges_ = numRanges_;
+ runTimeRanges_ = new(getHeap()) HdfsFileInfo[numRunTimeRanges_];
+ }
+ else
+ beginRangeNum_ = 0;
+
+ // third round, populate the ranges that this ESP needs to read
+ for (int h=beginRangeNum_; h<beginRangeNum_+numRanges_; h++)
+ {
+ HdfsFileInfo &e(runTimeRanges_[h-beginRangeNum_]);
+ const char *fileName = fileInfos[h].mName;
+ Int32 fileNameLen = strlen(fileName) + 1;
+
+ e.entryNum_ = h;
+ e.flags_ = 0;
+ e.fileName_ = new(getHeap()) char[fileNameLen];
+ str_cpy_all(e.fileName_, fileName, fileNameLen);
+ if (h == beginRangeNum_ &&
+ firstFileStartingOffset > 0)
+ {
+ e.startOffset_ = firstFileStartingOffset;
+ e.setFileIsSplitBegin(TRUE);
+ }
+ else
+ e.startOffset_ = 0;
+
+
+ if (h == beginRangeNum_+numRanges_-1 && lastFileBytesToRead > 0)
+ {
+ e.bytesToRead_ = lastFileBytesToRead;
+ e.setFileIsSplitEnd(TRUE);
+ }
+ else
+ e.bytesToRead_ = (Int64) fileInfos[h].mSize;
+ }
+}
+
+void ExHdfsScanTcb::deallocateRuntimeRanges()
+{
+ if (runTimeRanges_)
+ {
+ for (int i=0; i<numRunTimeRanges_; i++)
+ NADELETEBASIC(runTimeRanges_[i].fileName_.getPointer(), getHeap());
+ NADELETEBASIC(runTimeRanges_, getHeap());
+ runTimeRanges_ = NULL;
+ numRunTimeRanges_ = 0;
+ }
+}
+
+HdfsFileInfo * ExHdfsScanTcb::getRange(Int32 r)
+{
+ ex_assert(r >= beginRangeNum_ &&
+ r < beginRangeNum_+numRanges_,
+ "HDFS scan range num out of range");
+
+ if (hdfsScanTdb().getAssignRangesAtRuntime())
+ {
+ ex_assert(numRunTimeRanges_ == numRanges_,
+ "numRunTimeRanges_ != numRanges_");
+ return &runTimeRanges_[r-beginRangeNum_];
+ }
+ else
+ return (HdfsFileInfo*)
+ hdfsScanTdb().getHdfsFileInfoList()->get(r);
+}
+
short ExHdfsScanTcb::moveRowToUpQueue(const char * row, Lng32 len,
short * rc, NABoolean isVarchar)
{
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/executor/ExHdfsScan.h
----------------------------------------------------------------------
diff --git a/core/sql/executor/ExHdfsScan.h b/core/sql/executor/ExHdfsScan.h
index b22f31b..aba7fe1 100644
--- a/core/sql/executor/ExHdfsScan.h
+++ b/core/sql/executor/ExHdfsScan.h
@@ -164,6 +164,7 @@ protected:
, OPEN_HDFS_CURSOR
, CHECK_FOR_DATA_MOD
, CHECK_FOR_DATA_MOD_AND_DONE
+ , ASSIGN_RANGES_AT_RUNTIME
, GET_HDFS_DATA
, CLOSE_HDFS_CURSOR
, PROCESS_HDFS_ROW
@@ -209,6 +210,10 @@ protected:
char * extractAndTransformAsciiSourceToSqlRow(int &err,
ComDiagsArea * &diagsArea, int
mode);
+ void computeRangesAtRuntime();
+ void deallocateRuntimeRanges();
+ HdfsFileInfo *getRange(Int32 r);
+
short moveRowToUpQueue(const char * row, Lng32 len,
short * rc, NABoolean isVarchar);
@@ -270,6 +275,8 @@ protected:
Lng32 myInstNum_;
Lng32 beginRangeNum_;
Lng32 numRanges_;
+ HdfsFileInfo *runTimeRanges_;
+ Int32 numRunTimeRanges_;
Lng32 currRangeNum_;
char *endOfRequestedRange_ ; // helps rows span ranges.
char * hdfsFileName_;
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/generator/GenExplain.cpp
----------------------------------------------------------------------
diff --git a/core/sql/generator/GenExplain.cpp
b/core/sql/generator/GenExplain.cpp
index e625cfd..e6d608b 100644
--- a/core/sql/generator/GenExplain.cpp
+++ b/core/sql/generator/GenExplain.cpp
@@ -1892,13 +1892,9 @@ Exchange::addSpecificExplainInfo(ExplainTupleMaster
*explainTuple,
if (splitBottomTdb->getQueryUsesSM())
description += "seamonster_query: yes ";
- else
- description += "seamonster_query: no ";
if (splitBottomTdb->getExchangeUsesSM())
description += "seamonster_exchange: yes ";
- else
- description += "seamonster_exchange: no ";
explainTuple->setDescription(description); // save what we have built
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/generator/GenPreCode.cpp
----------------------------------------------------------------------
diff --git a/core/sql/generator/GenPreCode.cpp
b/core/sql/generator/GenPreCode.cpp
index 8d8ab9c..a360f37 100644
--- a/core/sql/generator/GenPreCode.cpp
+++ b/core/sql/generator/GenPreCode.cpp
@@ -1648,7 +1648,8 @@ RelExpr * RelRoot::preCodeGen(Generator * generator,
isCIFOn_ = FALSE;
if ((CmpCommon::getDefault(COMPRESSED_INTERNAL_FORMAT) == DF_ON )||
- generator->isFastExtract())
+ generator->isFastExtract() ||
+ generator->containsFastExtract())
{
isCIFOn_ = TRUE;
generator->setCompressedInternalFormat();
@@ -6058,6 +6059,101 @@ RelExpr * MapValueIds::preCodeGen(Generator * generator,
getGroupAttr()->addCharacteristicInputs(pulledNewInputs);
+ if (cseRef_)
+ {
+ // -------------------------------------------------------------
+ // This MapValueIds represents a common subexpression.
+ //
+ // We need to take some actions here to help with VEG rewrite,
+ // since we eliminated some nodes from the tree, while the
+ // VEGies still contain all equated values, including those that
+ // got eliminated. Furthermore, the one tree that was chosen for
+ // materialization got moved and we need to make sure that the
+ // place where we scan the temp table produces the same ValueIds
+ // that were marked as "Bridge Values" when we processed the
+ // insert into temp statement.
+ // -------------------------------------------------------------
+
+ ValueIdSet cseVEGPreds;
+ const ValueIdList &vegCols(cseRef_->getColumnList());
+ ValueIdSet nonVegCols(cseRef_->getNonVEGColumns());
+ NABoolean isAnalyzingConsumer =
+ (CmpCommon::statement()->getCSEInfo(cseRef_->getName())->
+ getIdOfAnalyzingConsumer() == cseRef_->getId());
+ ValueIdSet availableValues(
+ getGroupAttr()->getCharacteristicInputs());
+
+ valuesNeededForVEGRewrite_ += cseRef_->getNonVEGColumns();
+ availableValues += valuesNeededForVEGRewrite_;
+
+ // find all the VEG predicates of the original columns that this
+ // common subexpression represents...
+ for (CollIndex v=0; v<vegCols.entries(); v++)
+ if (vegCols[v].getItemExpr()->getOperatorType() == ITM_VEG_REFERENCE)
+ {
+ // look at one particular VEG that is produced by this
+ // query tree
+ VEG *veg =
+ static_cast<VEGReference *>(vegCols[v].getItemExpr())->getVEG();
+
+ if (isAnalyzingConsumer && veg->getBridgeValues().entries() > 0)
+ {
+ // If we are looking at the analyzing consumer, then
+ // its child tree "C" got transformed into an
+ // "insert overwrite table "temp" select * from "C".
+
+ // This insert into temp statement chose some VEG
+ // member(s) as the "bridge value(s)". Find these bridge
+ // values and choose one to represent the VEG here.
+ const ValueIdSet &vegMembers(veg->getAllValues());
+
+ // collect all VEG members produced and subtract them
+ // from the values to be used for VEG rewrite
+ ValueIdSet subtractions(cseRef_->getNonVEGColumns());
+ // then add back only the bridge value
+ ValueIdSet additions;
+
+ // get the VEG members produced by child C
+ subtractions.intersectSet(vegMembers);
+
+ // augment the base columns with their index columns,
+ // the bridge value is likely an index column
+ for (ValueId v=subtractions.init();
+ subtractions.next(v);
+ subtractions.advance(v))
+ if (v.getItemExpr()->getOperatorType() == ITM_BASECOLUMN)
+ {
+ subtractions +=
+ static_cast<BaseColumn *>(v.getItemExpr())->getEIC();
+ }
+
+ // now find a bridge value (or values) that we can
+ // produce
+ additions = subtractions;
+ additions.intersectSet(veg->getBridgeValues());
+
+ // if we found it, then adjust availableValues
+ if (additions.entries() > 0)
+ {
+ availableValues -= subtractions;
+ availableValues += additions;
+ }
+ }
+
+ cseVEGPreds += veg->getVEGPredicate()->getValueId();
+ } // a VEGRef
+
+ // Replace the VEGPredicates, pretending that we still have
+ // the original tree below us, not the materialized temp
+ // table. This will hopefully keep the bookkeeping in the
+ // VEGies correct by setting the right referenced values
+ // and choosing the right bridge values.
+ cseVEGPreds.replaceVEGExpressions(
+ availableValues,
+ getGroupAttr()->getCharacteristicInputs());
+
+ } // this MapValueIds is for a common subexpression
+
// ---------------------------------------------------------------------
// The MapValueIds node describes a mapping between expressions used
// by its child tree and expressions used by its parent tree. The
@@ -6126,10 +6222,9 @@ RelExpr * MapValueIds::preCodeGen(Generator * generator,
x.getItemExpr()->getLeafValueIds(leafValues);
lowerAvailableValues += leafValues;
- // upperAvailableValues is needed for mvqr. The addition of the lower
- // available values is only necessary to avoid an assertion failure in
- // VEGReference::replaceVEGReference().
- ValueIdSet upperAvailableValues(valuesForVEGRewrite());
+ ValueIdSet upperAvailableValues(valuesNeededForVEGRewrite_);
+ // The addition of the lower available values is only necessary to
+ // avoid an assertion failure in VEGReference::replaceVEGReference().
upperAvailableValues += lowerAvailableValues;
// ---------------------------------------------------------------------
@@ -6180,11 +6275,12 @@ RelExpr * MapValueIds::preCodeGen(Generator * generator,
{
if (newUpper->getOperatorType() == ITM_VEG_REFERENCE)
{
- if (usedByMvqr())
- // If this node is used to map the outputs of an MV added by
- // MVQR, upperAvailableValues has been constructed to
- // contain the base column a vegref should map to, so we use
- // that instead of a created surrogate.
+ if (valuesNeededForVEGRewrite_.entries() > 0)
+ // If this node is used to map the outputs of one
+ // table to those of another, upperAvailableValues
+ // has been constructed to contain the base column a
+ // vegref should map to, so we use that instead of a
+ // created surrogate.
newUpper = newUpper->replaceVEGExpressions
(upperAvailableValues,
getGroupAttr()->getCharacteristicInputs());
@@ -10715,7 +10811,10 @@ RelExpr * PhysicalFastExtract::preCodeGen (Generator *
generator,
if (nodeIsPreCodeGenned())
return this;
- generator->setIsFastExtract(TRUE);
+ if (getIsMainQueryOperator())
+ generator->setIsFastExtract(TRUE);
+ else
+ generator->setContainsFastExtract(TRUE);
if (!RelExpr::preCodeGen(generator,externalInputs,pulledNewInputs))
return NULL;
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/generator/GenRelExeUtil.cpp
----------------------------------------------------------------------
diff --git a/core/sql/generator/GenRelExeUtil.cpp
b/core/sql/generator/GenRelExeUtil.cpp
index dcf5f16..ee089b4 100644
--- a/core/sql/generator/GenRelExeUtil.cpp
+++ b/core/sql/generator/GenRelExeUtil.cpp
@@ -119,6 +119,11 @@ short GenericUtilExpr::processOutputRow(Generator *
generator,
{
ExpGenerator * expGen = generator->getExpGenerator();
Space * space = generator->getSpace();
+ TableDesc *virtTableDesc = getVirtualTableDesc();
+
+ if (!virtTableDesc)
+ // this operator produces no outputs (e.g. truncate used with a temp table)
+ return 0;
// Assumption (for now): retrievedCols contains ALL columns from
// the table/index. This is because this operator does
@@ -130,7 +135,7 @@ short GenericUtilExpr::processOutputRow(Generator *
generator,
Attributes ** attrs =
new(generator->wHeap())
- Attributes * [getVirtualTableDesc()->getColumnList().entries()];
+ Attributes * [virtTableDesc->getColumnList().entries()];
for (CollIndex i = 0; i < getVirtualTableDesc()->getColumnList().entries();
i++)
{
@@ -812,6 +817,9 @@ short ExeUtilCleanupVolatileTables::codeGen(Generator *
generator)
if (type_ == ALL_TABLES_IN_ALL_CATS)
exe_util_tdb->setCleanupAllTables(TRUE);
+ if (CmpCommon::getDefault(CSE_CLEANUP_HIVE_TABLES) != DF_OFF)
+ exe_util_tdb->setCleanupHiveCSETables(TRUE);
+
if(!generator->explainDisabled()) {
generator->setExplainTuple(
addExplainInfo(exe_util_tdb, 0, 0, generator));
@@ -3324,6 +3332,21 @@ short ExeUtilHiveTruncate::codeGen(Generator * generator)
tablename = space->AllocateAndCopyToAlignedSpace
(generator->genGetNameAsAnsiNAString(getTableName()), 0);
+ NAString hiveTableNameStr;
+ char * hiveTableName = NULL;
+
+ if (!getTableName().isInHiveDefaultSchema())
+ {
+ hiveTableNameStr =
+ getTableName().getQualifiedNameObj().getSchemaName();
+ hiveTableNameStr += ".";
+ }
+
+ hiveTableNameStr +=
+ getTableName().getQualifiedNameObj().getObjectName();
+ hiveTableName = space->AllocateAndCopyToAlignedSpace(
+ hiveTableNameStr, 0);
+
char * hiveTableLocation = NULL;
char * hiveHdfsHost = NULL;
Int32 hiveHdfsPort = getHiveHdfsPort();
@@ -3332,6 +3355,8 @@ short ExeUtilHiveTruncate::codeGen(Generator * generator)
space->AllocateAndCopyToAlignedSpace (getHiveTableLocation(), 0);
hiveHdfsHost =
space->AllocateAndCopyToAlignedSpace (getHiveHostName(), 0);
+ if (getSuppressModCheck())
+ hiveModTS_ = 0;
NABoolean doSimCheck = FALSE;
if ((CmpCommon::getDefault(HIVE_DATA_MOD_CHECK) == DF_ON) &&
@@ -3340,6 +3365,7 @@ short ExeUtilHiveTruncate::codeGen(Generator * generator)
ComTdbExeUtilHiveTruncate * exe_util_tdb = new(space)
ComTdbExeUtilHiveTruncate(tablename, strlen(tablename),
+ hiveTableName,
hiveTableLocation, partn_loc,
hiveHdfsHost, hiveHdfsPort,
(doSimCheck ? hiveModTS_ : -1),
@@ -3351,6 +3377,9 @@ short ExeUtilHiveTruncate::codeGen(Generator * generator)
getDefault(GEN_DDL_BUFFER_SIZE));
generator->initTdbFields(exe_util_tdb);
+
+ if (getDropTableOnDealloc())
+ exe_util_tdb->setDropOnDealloc(TRUE);
if(!generator->explainDisabled()) {
generator->setExplainTuple(
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/generator/GenRelMisc.cpp
----------------------------------------------------------------------
diff --git a/core/sql/generator/GenRelMisc.cpp
b/core/sql/generator/GenRelMisc.cpp
index 3d69151..f48f1b3 100644
--- a/core/sql/generator/GenRelMisc.cpp
+++ b/core/sql/generator/GenRelMisc.cpp
@@ -1103,6 +1103,17 @@ short RelRoot::codeGen(Generator * generator)
else
qCacheInfoBuf = parameterBuffer;
+ // Check for reasons why the query plan should not be cached.
+ // Note: This does not influence the use of cache parameters,
+ // it's too late at this time to undo that.
+ const LIST(CSEInfo *) *cseInfoList =
CmpCommon::statement()->getCSEInfoList();
+
+ if (cseInfoList &&
+ CmpCommon::getDefault(CSE_CACHE_TEMP_QUERIES) == DF_OFF)
+ for (CollIndex i=0; i<cseInfoList->entries(); i++)
+ if (cseInfoList->at(i)->usesATempTable())
+ generator->setNonCacheableCSEPlan(TRUE);
+
// compute offsets for rwrs attrs. Offsets are computed separately
// for rwrs vars since these values will be moved as part of input
// row at runtime. This input row should only contain values which are
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/generator/GenRelScan.cpp
----------------------------------------------------------------------
diff --git a/core/sql/generator/GenRelScan.cpp
b/core/sql/generator/GenRelScan.cpp
index b5da8b4..aebe5b0 100644
--- a/core/sql/generator/GenRelScan.cpp
+++ b/core/sql/generator/GenRelScan.cpp
@@ -366,8 +366,13 @@ short FileScan::genForTextAndSeq(Generator * generator,
// determine host and port from dir name
NAString dummy, hostName;
- NABoolean result = ((HHDFSTableStats*)hTabStats)->splitLocation
- (hTabStats->tableDir().data(), hostName, hdfsPort, dummy) ;
+ NABoolean result = TableDesc::splitHiveLocation(
+ hTabStats->tableDir().data(),
+ hostName,
+ hdfsPort,
+ dummy,
+ CmpCommon::diags(),
+ hTabStats->getPortOverride());
GenAssert(result, "Invalid Hive directory name");
hdfsHostName =
space->AllocateAndCopyToAlignedSpace(hostName, 0);
@@ -1119,7 +1124,8 @@ short FileScan::codeGenForHive(Generator * generator)
if (expirationTimestamp > 0)
expirationTimestamp += 1000000 *
(Int64) CmpCommon::getDefaultLong(HIVE_METADATA_REFRESH_INTERVAL);
- generator->setPlanExpirationTimestamp(expirationTimestamp);
+ if (!getCommonSubExpr())
+ generator->setPlanExpirationTimestamp(expirationTimestamp);
short type = (short)ComTdbHdfsScan::UNKNOWN_;
if (hTabStats->isTextFile())
@@ -1215,14 +1221,21 @@ if (hTabStats->isOrcFile())
Lng32 numOfPartLevels = -1;
Queue * hdfsDirsToCheck = NULL;
+ hdfsRootDir =
+ space->allocateAndCopyToAlignedSpace(hTabStats->tableDir().data(),
+ hTabStats->tableDir().length(),
+ 0);
+
// Right now, timestamp info is not being generated correctly for
// partitioned files. Skip data mod check for them.
// Remove this check when partitioned info is set up correctly.
if ((CmpCommon::getDefault(HIVE_DATA_MOD_CHECK) == DF_ON) &&
(CmpCommon::getDefault(TRAF_SIMILARITY_CHECK) != DF_OFF) &&
- (hTabStats->numOfPartCols() <= 0))
+ (hTabStats->numOfPartCols() <= 0) &&
+ (!getCommonSubExpr()))
{
+ modTS = hTabStats->getModificationTS();
numOfPartLevels = hTabStats->numOfPartCols();
// if specific directories are to checked based on the query struct
@@ -1313,6 +1326,9 @@ if (hTabStats->isOrcFile())
hdfsscan_tdb->setHiveScanMode(hiveScanMode);
+ if (getCommonSubExpr())
+ hdfsscan_tdb->setAssignRangesAtRuntime(TRUE);
+
NABoolean hdfsPrefetch = FALSE;
if (CmpCommon::getDefault(HDFS_PREFETCH) == DF_ON)
hdfsPrefetch = TRUE;
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/generator/Generator.cpp
----------------------------------------------------------------------
diff --git a/core/sql/generator/Generator.cpp b/core/sql/generator/Generator.cpp
index 1417a55..ca4346c 100644
--- a/core/sql/generator/Generator.cpp
+++ b/core/sql/generator/Generator.cpp
@@ -146,6 +146,8 @@ Generator::Generator(CmpContext* currentCmpContext) :
nonCacheableMVQRplan_ = FALSE;
+ nonCacheableCSEPlan_ = FALSE;
+
updateWithinCS_ = FALSE;
isInternalRefreshStatement_ = FALSE;
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/generator/Generator.h
----------------------------------------------------------------------
diff --git a/core/sql/generator/Generator.h b/core/sql/generator/Generator.h
index a0d7161..d1f2534 100644
--- a/core/sql/generator/Generator.h
+++ b/core/sql/generator/Generator.h
@@ -220,7 +220,7 @@ class Generator : public NABasicObject
// if set, then sidetreeinsert operator need to be added.
, ENABLE_TRANSFORM_TO_STI = 0x00000004
- , FAST_EXTRACT = 0x00000008 // UNLOAD query
+ , IS_FAST_EXTRACT = 0x00000008 // UNLOAD query
// if lobs are accessed at runtime
, PROCESS_LOB = 0x00000010
@@ -237,6 +237,7 @@ class Generator : public NABasicObject
// If Hive tables are accessed at runtime
, HIVE_ACCESS = 0x00000400
+ , CONTAINS_FAST_EXTRACT = 0x00000800
};
// Each operator node receives some tupps in its input atp and
@@ -391,6 +392,7 @@ class Generator : public NABasicObject
NABoolean isInternalRefreshStatement_;
NABoolean nonCacheableMVQRplan_;
+ NABoolean nonCacheableCSEPlan_;
// is set when relroot has an order by requirement
NABoolean orderRequired_;
@@ -1224,10 +1226,17 @@ public:
{ return (flags2_ & DP2_XNS_ENABLED) != 0; }
void setDp2XnsEnabled(NABoolean v)
{ (v ? flags2_ |= DP2_XNS_ENABLED : flags2_ &= ~DP2_XNS_ENABLED); }
- NABoolean isFastExtract() { return (flags2_ & FAST_EXTRACT) != 0; };
+ // statement is a fast extract operation
+ NABoolean isFastExtract() { return (flags2_ & IS_FAST_EXTRACT) != 0; };
void setIsFastExtract(NABoolean v)
{
- (v ? flags2_ |= FAST_EXTRACT : flags2_ &= ~FAST_EXTRACT);
+ (v ? flags2_ |= IS_FAST_EXTRACT : flags2_ &= ~IS_FAST_EXTRACT);
+ }
+ // statement contains a fast extract operator somewhere
+ NABoolean containsFastExtract() { return (flags2_ & CONTAINS_FAST_EXTRACT)
!= 0; };
+ void setContainsFastExtract(NABoolean v)
+ {
+ (v ? flags2_ |= CONTAINS_FAST_EXTRACT : flags2_ &= ~CONTAINS_FAST_EXTRACT);
}
/* NABoolean noTransformToSTI()
@@ -1367,9 +1376,13 @@ public:
void setFoundAnUpdate(NABoolean u) { foundAnUpdate_ = u;}
+ NABoolean isNonCacheablePlan() const
+ { return nonCacheableMVQRplan_ || nonCacheableCSEPlan_;}
NABoolean isNonCacheableMVQRplan() const { return nonCacheableMVQRplan_;}
+ NABoolean isNonCacheableCSEPlan() const { return nonCacheableCSEPlan_;}
void setNonCacheableMVQRplan(NABoolean n) { nonCacheableMVQRplan_ = n;}
+ void setNonCacheableCSEPlan(NABoolean n) { nonCacheableCSEPlan_ = n;}
NABoolean updateWithinCS() const { return updateWithinCS_;}
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/nskgmake/arkcmplib/Makefile
----------------------------------------------------------------------
diff --git a/core/sql/nskgmake/arkcmplib/Makefile
b/core/sql/nskgmake/arkcmplib/Makefile
index c61310e..0a20fd2 100755
--- a/core/sql/nskgmake/arkcmplib/Makefile
+++ b/core/sql/nskgmake/arkcmplib/Makefile
@@ -40,10 +40,6 @@ CPPSRC := cmpargs.cpp \
rtdu2.cpp \
vers_libarkcmp.cpp
-# The Windows part of this should be temporary.
-DEFS += -DWIN32 -D_WINDOWS -D__SHADOW
-DCOPYRIGHT_VERSION_H=\"$(TRAFODION_VER)\"
-
-
INCLUDE_DIRS := sqludr
SRCPATH := arkcmp cli
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/optimizer/BindRelExpr.cpp
----------------------------------------------------------------------
diff --git a/core/sql/optimizer/BindRelExpr.cpp
b/core/sql/optimizer/BindRelExpr.cpp
index ee65f9e..97b5eae 100644
--- a/core/sql/optimizer/BindRelExpr.cpp
+++ b/core/sql/optimizer/BindRelExpr.cpp
@@ -9349,41 +9349,6 @@ RelExpr *Insert::bindNode(BindWA *bindWA)
return this;
}
- RelExpr * mychild = child(0);
-
- const HHDFSTableStats* hTabStats =
- getTableDesc()->getNATable()->getClusteringIndex()->getHHDFSTableStats();
-
- const char * hiveTablePath;
- NAString hostName;
- Int32 hdfsPort;
- NAString tableDir;
- NABoolean result;
-
- char fldSep[2];
- char recSep[2];
- memset(fldSep,'\0',2);
- memset(recSep,'\0',2);
- fldSep[0] = hTabStats->getFieldTerminator();
- recSep[0] = hTabStats->getRecordTerminator();
-
- // don't rely on timeouts to invalidate the HDFS stats for the target
table,
- // make sure that we invalidate them right after compiling this statement,
- // at least for this process
-
((NATable*)(getTableDesc()->getNATable()))->setClearHDFSStatsAfterStmt(TRUE);
-
- // inserting into tables with multiple partitions is not yet supported
- CMPASSERT(hTabStats->entries() == 1);
- hiveTablePath = (*hTabStats)[0]->getDirName();
- result = ((HHDFSTableStats* )hTabStats)->splitLocation
- (hiveTablePath, hostName, hdfsPort, tableDir) ;
- if (!result) {
- *CmpCommon::diags() << DgSqlCode(-4224)
- << DgString0(hiveTablePath);
- bindWA->setErrStatus();
- return this;
- }
-
// specifying a list of column names to insert to is not yet supported
if (insertColTree_) {
*CmpCommon::diags() << DgSqlCode(-4223)
@@ -9392,48 +9357,24 @@ RelExpr *Insert::bindNode(BindWA *bindWA)
return this;
}
- // NABoolean isSequenceFile = (*hTabStats)[0]->isSequenceFile();
- const NABoolean isSequenceFile = hTabStats->isSequenceFile();
-
- RelExpr * unloadRelExpr =
- new (bindWA->wHeap())
- FastExtract( mychild,
- new (bindWA->wHeap()) NAString(hiveTablePath),
- new (bindWA->wHeap()) NAString(hostName),
- hdfsPort,
- getTableDesc(),
- new (bindWA->wHeap())
NAString(getTableName().getQualifiedNameObj().getObjectName()),
- FastExtract::FILE,
- bindWA->wHeap());
- RelExpr * boundUnloadRelExpr = unloadRelExpr->bindNode(bindWA);
- if (bindWA->errStatus())
- return NULL;
-
- ((FastExtract*)boundUnloadRelExpr)->setRecordSeparator(recSep);
- ((FastExtract*)boundUnloadRelExpr)->setDelimiter(fldSep);
-
((FastExtract*)boundUnloadRelExpr)->setOverwriteHiveTable(getOverwriteHiveTable());
- ((FastExtract*)boundUnloadRelExpr)->setSequenceFile(isSequenceFile);
- if (getOverwriteHiveTable())
- {
- RelExpr * newRelExpr = new (bindWA->wHeap())
- ExeUtilHiveTruncate(getTableName(), NULL, bindWA->wHeap());
-
- //new root to prevent error 4056 when binding
- newRelExpr = new (bindWA->wHeap()) RelRoot(newRelExpr);
-
- RelExpr *blockedUnion = new (bindWA->wHeap()) Union(newRelExpr,
boundUnloadRelExpr);
- ((Union*)blockedUnion)->setBlockedUnion();
- ((Union*)blockedUnion)->setSerialUnion();
-
- RelExpr *boundBlockedUnion = blockedUnion->bindNode(bindWA);
- if (bindWA->errStatus())
- return NULL;
-
- return boundBlockedUnion;
-
- }
+ RelExpr *feResult = FastExtract::makeFastExtractTree(
+ getTableDesc(),
+ child(0).getPtr(),
+ getOverwriteHiveTable(),
+ TRUE, // called from within binder
+ FALSE, // not a common subexpr
+ bindWA);
+
+ if (feResult)
+ {
+ feResult = feResult->bindNode(bindWA);
+ if (bindWA->errStatus())
+ return NULL;
- return boundUnloadRelExpr;
+ return feResult;
+ }
+ else
+ return this;
}
if(!(getOperatorType() == REL_UNARY_INSERT &&
@@ -17144,6 +17085,45 @@ bool ControlRunningQuery::isUserAuthorized(BindWA
*bindWA)
}// ControlRunningQuery::isUserAuthorized()
+RelExpr * CommonSubExprRef::bindNode(BindWA *bindWA)
+{
+ if (nodeIsBound()) {
+ bindWA->getCurrentScope()->setRETDesc(getRETDesc());
+ return this;
+ }
+
+ CSEInfo *info = CmpCommon::statement()->getCSEInfo(internalName_);
+ CommonSubExprRef *parentCSE = bindWA->inCSE();
+
+ DCMPASSERT(info);
+
+ bindWA->setInCSE(this);
+
+ if (parentCSE)
+ // establish the parent/child relationship, if not done already
+
CmpCommon::statement()->getCSEInfo(parentCSE->getName())->addChildCSE(info);
+
+ bindChildren(bindWA);
+ if (bindWA->errStatus())
+ return this;
+
+ // eliminate any CommonSubExprRef nodes that are not truly common,
+ // i.e. those that are referenced only once
+ if (CmpCommon::statement()->getCSEInfo(internalName_)->getNumConsumers() <=
1)
+ return child(0).getPtr();
+
+ // we know that our child is a RenameTable (same name as this CSE,
+ // whose child is a RelRoot, defining the CTE. Copy the bound select
+ // list of the CTE.
+ CMPASSERT(child(0)->getOperatorType() == REL_RENAME_TABLE &&
+ child(0)->child(0)->getOperatorType() == REL_ROOT);
+ columnList_ = static_cast<RelRoot
*>(child(0)->child(0).getPtr())->compExpr();
+
+ bindWA->setInCSE(parentCSE);
+
+ return bindSelf(bindWA);
+}
+
RelExpr * OSIMControl::bindNode(BindWA *bindWA)
{
if (nodeIsBound())
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/optimizer/BindWA.cpp
----------------------------------------------------------------------
diff --git a/core/sql/optimizer/BindWA.cpp b/core/sql/optimizer/BindWA.cpp
index 49d9586..3b27d8e 100644
--- a/core/sql/optimizer/BindWA.cpp
+++ b/core/sql/optimizer/BindWA.cpp
@@ -163,6 +163,7 @@ BindWA::BindWA(SchemaDB *schemaDB, CmpContext* cmpContext,
NABoolean inDDL, NABo
, renameToScanTable_ (FALSE)
, inViewExpansion_ (FALSE)
, inliningInfoFlagsToSetRecursivly_(0)
+ , currCSE_(NULL)
, inCTAS_(FALSE)
, viewsUsed_("", wHeap())
, hasDynamicRowsetsInQuery_(FALSE)
http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/b90dc334/core/sql/optimizer/BindWA.h
----------------------------------------------------------------------
diff --git a/core/sql/optimizer/BindWA.h b/core/sql/optimizer/BindWA.h
index 6faf167..ac3bfa0 100644
--- a/core/sql/optimizer/BindWA.h
+++ b/core/sql/optimizer/BindWA.h
@@ -84,6 +84,7 @@ struct TrafDesc;
class NARoutine;
class HbaseColUsageInfo;
class ExeUtilHbaseCoProcAggr;
+class CommonSubExprRef;
// ***********************************************************************
// BindContext
@@ -1421,7 +1422,7 @@ public:
NABoolean inViewDefinition() const;
NABoolean inMVDefinition() const;
NABoolean inCheckConstraintDefinition() const;
-
+
//----------------------------------------------------------------------
// Get the NARoutine associated with this routine name
//----------------------------------------------------------------------
@@ -1567,6 +1568,9 @@ public:
return r;
}
+ CommonSubExprRef *inCSE() const { return currCSE_; }
+ void setInCSE(CommonSubExprRef *cte) { currCSE_ = cte; }
+
NABoolean inCTAS() const { return inCTAS_; }
void setInCTAS(NABoolean t)
{
@@ -1886,6 +1890,9 @@ private:
ValueIdMap updateToScanValueIds_;
// QSTUFF
+ // set if we are currently under a CommonSubExprRef node
+ CommonSubExprRef *currCSE_;
+
NABoolean inCTAS_;
// names of referenced views. used by query caching to guard against false