jcamachor commented on a change in pull request #1324:
URL: https://github.com/apache/hive/pull/1324#discussion_r463040481
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##########
@@ -247,6 +249,18 @@ public ParseContext transform(ParseContext pctx) throws
SemanticException {
}
}
+ if (LOG.isDebugEnabled()) {
Review comment:
Can we move the new call before the
`if(pctx.getConf().getBoolVar(ConfVars.HIVE_SHARED_WORK_REUSE_MAPJOIN_CACHE))
{` block? It makes sense to trigger that block at the very end in case we
continue adding phases.
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##########
@@ -247,6 +249,18 @@ public ParseContext transform(ParseContext pctx) throws
SemanticException {
}
}
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Before SharedWorkOptimizer #2:\n" +
Operator.toString(pctx.getTopOps().values()));
+ }
+
+ // Execute shared work optimization
+ new BaseSharedWorkOptimizer().sharedWorkOptimization(
Review comment:
Can we put this additional step under a new flag (`true` by default)?
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##########
@@ -273,258 +287,332 @@ public ParseContext transform(ParseContext pctx) throws
SemanticException {
return pctx;
}
- private static boolean sharedWorkOptimization(ParseContext pctx,
SharedWorkOptimizerCache optimizerCache,
- ArrayListMultimap<String, TableScanOperator> tableNameToOps,
List<Entry<String, Long>> sortedTables,
- boolean removeSemijoin) throws SemanticException {
- // Boolean to keep track of whether this method actually merged any TS
operators
- boolean mergedExecuted = false;
-
- Multimap<String, TableScanOperator> existingOps =
ArrayListMultimap.create();
- Set<Operator<?>> removedOps = new HashSet<>();
- for (Entry<String, Long> tablePair : sortedTables) {
- String tableName = tablePair.getKey();
- for (TableScanOperator discardableTsOp : tableNameToOps.get(tableName)) {
- if (removedOps.contains(discardableTsOp)) {
- LOG.debug("Skip {} as it has already been removed", discardableTsOp);
- continue;
- }
- Collection<TableScanOperator> prevTsOps = existingOps.get(tableName);
- for (TableScanOperator retainableTsOp : prevTsOps) {
- if (removedOps.contains(retainableTsOp)) {
- LOG.debug("Skip {} as it has already been removed",
retainableTsOp);
+ private static class BaseSharedWorkOptimizer {
Review comment:
Can we add a clarifying comment to the new internal classes with the
difference between both of them?
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##########
@@ -247,6 +249,18 @@ public ParseContext transform(ParseContext pctx) throws
SemanticException {
}
}
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Before SharedWorkOptimizer #2:\n" +
Operator.toString(pctx.getTopOps().values()));
Review comment:
This is the same as `After SharedWorkSJOptimizer`, no need to print it
again.
##########
File path:
ql/src/test/results/clientpositive/llap/annotate_stats_join_pkfk.q.out
##########
@@ -1191,14 +1191,6 @@ STAGE PLANS:
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 12 Data size: 48 Basic stats:
COMPLETE Column stats: COMPLETE
- Execution mode: vectorized, llap
Review comment:
I do not see projected columns, so it is difficult to confirm whether
this is because of the schema. Is it?
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##########
@@ -247,6 +249,18 @@ public ParseContext transform(ParseContext pctx) throws
SemanticException {
}
}
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Before SharedWorkOptimizer #2:\n" +
Operator.toString(pctx.getTopOps().values()));
+ }
+
+ // Execute shared work optimization
+ new BaseSharedWorkOptimizer().sharedWorkOptimization(
+ pctx, optimizerCache, tableNameToOps, sortedTables, false);
+
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("After SharedWorkOptimizer #2:\n" +
Operator.toString(pctx.getTopOps().values()));
Review comment:
`After SharedWorkOptimizer merging TS schema`?
##########
File path:
ql/src/test/results/clientpositive/llap/auto_join_reordering_values.q.out
##########
@@ -144,122 +144,30 @@ STAGE PLANS:
tag: 0
value expressions: _col0 (type: int), _col2 (type:
int), _col3 (type: int)
auto parallelism: true
- Execution mode: vectorized, llap
- LLAP IO: no inputs
- Path -> Alias:
-#### A masked pattern was here ####
- Path -> Partition:
-#### A masked pattern was here ####
- Partition
- base file name: orderpayment_small
- input format: org.apache.hadoop.mapred.TextInputFormat
- output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
- properties:
- bucket_count -1
- bucketing_version 2
- column.name.delimiter ,
- columns dealid,date,time,cityid,userid
- columns.types int:string:string:int:int
-#### A masked pattern was here ####
- name default.orderpayment_small
- serialization.format 1
- serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
- serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
-
- input format: org.apache.hadoop.mapred.TextInputFormat
- output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
- properties:
- bucketing_version 2
- column.name.delimiter ,
- columns dealid,date,time,cityid,userid
- columns.comments
- columns.types int:string:string:int:int
-#### A masked pattern was here ####
- name default.orderpayment_small
- serialization.format 1
- serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
- serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
- name: default.orderpayment_small
- name: default.orderpayment_small
- Truncated Path -> Alias:
- /orderpayment_small [orderpayment]
- Map 6
- Map Operator Tree:
- TableScan
- alias: dim_pay_date
- filterExpr: date is not null (type: boolean)
- Statistics: Num rows: 1 Data size: 94 Basic stats: COMPLETE
Column stats: COMPLETE
- GatherStats: false
Filter Operator
isSamplingPred: false
- predicate: date is not null (type: boolean)
- Statistics: Num rows: 1 Data size: 94 Basic stats:
COMPLETE Column stats: COMPLETE
+ predicate: dealid is not null (type: boolean)
+ Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
- expressions: date (type: string)
Review comment:
Why is this projection changing? Is this correct? I do not see the
branch with `date` column.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]