> On June 23, 2016, 3 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelCollationPropagator.java,
> >  line 117
> > <https://reviews.apache.org/r/48500/diff/7/?file=1427704#file1427704line117>
> >
> >     I may be wrong, but doesn't this mean we may trigger dispatch() on all 
> > nodes underneath which implies this is O(n*2) algorithm.

In fact it is O(n): we visit each operator exactly once. Observe that this 
class is not a visitor, thus it is necessary that each method calls dispatch on 
its children.


> On June 23, 2016, 3 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelCollationPropagator.java,
> >  line 140
> > <https://reviews.apache.org/r/48500/diff/7/?file=1427704#file1427704line140>
> >
> >     We may allow some (order-preserving) udfs here. Wanna leave a TODO ?

You are correct, I will add a comment.


> On June 23, 2016, 3 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelCollationPropagator.java,
> >  lines 196-197
> > <https://reviews.apache.org/r/48500/diff/7/?file=1427704#file1427704line196>
> >
> >     Recursive calls?

See my previous comment.


> On June 23, 2016, 3 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java,
> >  lines 85-90
> > <https://reviews.apache.org/r/48500/diff/7/?file=1427706#file1427706line85>
> >
> >     Shall we add a flag with default value true to enable propagator? 
> >     Apart from debugging flag in this case will also be useful to quantify 
> > perf gains.

That is a cool idea. I will do it.


> On June 23, 2016, 3 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/queries/clientpositive/reduce_deduplicate_extended2.q, line 3
> > <https://reviews.apache.org/r/48500/diff/7/?file=1427709#file1427709line3>
> >
> >     We need to explore this further, since auto convert join is on by 
> > default, implying RSdedup is off by default whenever join is in pipeline.

Yes, this is a configuration nightmare, we need three different properties set 
to the right values to enable RSDedup for join operators... I can create a 
follow-up issue to study this.


> On June 23, 2016, 3 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/ptfgroupbyjoin.q.out, lines 82-85
> > <https://reviews.apache.org/r/48500/diff/7/?file=1427742#file1427742line82>
> >
> >     Redundant sel op? Case for IdentityProjectRemover?

It swaps the input cols; it is true that it would not need to do this, but 
current IdentityProjectRemover only removes Select if columns are exactly in 
the same order (I think we had problems previously with other cases because 
RowResolver was not always up-to-date e.g. if some optimization had modified 
the tree, plus some operators are designed to access columns in the order that 
they are in the input records).


> On June 23, 2016, 3 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/reduce_deduplicate_extended2.q.out, line 
> > 18
> > <https://reviews.apache.org/r/48500/diff/7/?file=1427743#file1427743line18>
> >
> >     Better plan here would be a single stage with GBY executing with mode = 
> > complete. Since we always generate map-side GBY at plan generation time, 
> > this implies collapsing this GBy after RS dedup.

Yes, this is another case that RSDedup is missing. The good thing about this 
patch is that it will help us realizing many extensions that can be implemented 
for RSDedup and that we are missing today. We can discuss this offline, as I 
think it is not the only opportunity that we are missing; in fact, 
reduce_deduplicate_extended2.q.out contains many of them.


> On June 23, 2016, 3 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/reduce_deduplicate_extended2.q.out, line 
> > 123
> > <https://reviews.apache.org/r/48500/diff/7/?file=1427743#file1427743line123>
> >
> >     Here also, we can have single stage, but depending on key cardinality 
> > that may be suboptimal since # of reducer for OBy = 1. 
> >     
> >     At some point we need to re-order optimizations such that RS dedup runs 
> > after StatsAnnotation rule so that we can properly make such decision.

I agree.


> On June 23, 2016, 3 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/subquery_in_having.q.out, lines 286-289
> > <https://reviews.apache.org/r/48500/diff/7/?file=1427759#file1427759line286>
> >
> >     case for IdentityProjectRemover?

Same as before, swapping columns.


> On June 23, 2016, 3 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/filter_cond_pushdown.q.out, lines 309-311
> > <https://reviews.apache.org/r/48500/diff/7/?file=1427714#file1427714line309>
> >
> >     This can be further optimized. We can drop constants from sort & 
> > partitioning columns.

The reason why we do not do it seems to be the CAST in the join operation.


> On June 23, 2016, 3 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/union25.q.out, line 110
> > <https://reviews.apache.org/r/48500/diff/7/?file=1427778#file1427778line110>
> >
> >     duplicated columns in sort/partition columns.

Solved, it was an issue with duplicates in the GBy columns when we were 
propagating order on the same column twice.


- Jesús


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48500/#review139251
-----------------------------------------------------------


On June 22, 2016, 8:27 p.m., Jesús Camacho Rodríguez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48500/
> -----------------------------------------------------------
> 
> (Updated June 22, 2016, 8:27 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-13982
>     https://issues.apache.org/jira/browse/HIVE-13982
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-13982
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java
>  9cb62c8b4cf16f0d39a8184a0df52dc6e5dea458 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelCollationPropagator.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java
>  353d8db41af10512c94c0700a9bb06a07d660190 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java
>  1a543fb1c8a9c2fb5b89aba211f0dd8910aeb048 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
>  77771c3eb8155defa99a223ccf4ee4b072abb40a 
>   ql/src/test/queries/clientpositive/limit_pushdown2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/reduce_deduplicate_extended2.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out 
> d83d7dbf17adaee6a6f3b64fdbcde1b8aabf6d8e 
>   ql/src/test/results/clientpositive/bucket_groupby.q.out 
> 867fad43686d69dfd77cc5accc5d9d09f4364301 
>   ql/src/test/results/clientpositive/correlationoptimizer13.q.out 
> ac5bdc6878bcd2383865119e6a7f6b849a24c505 
>   ql/src/test/results/clientpositive/dynamic_rdd_cache.q.out 
> 4088a39ed2b4a51022b6124c5e1ae9b9334bb859 
>   ql/src/test/results/clientpositive/filter_cond_pushdown.q.out 
> 46b701f382c1a4a22fe0228bbfcd23dcf6054218 
>   ql/src/test/results/clientpositive/limit_pushdown2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/lineage3.q.out 
> 12ae13e388b3cb9c051cb419b75682fa4296d211 
>   ql/src/test/results/clientpositive/llap/tez_union2.q.out 
> 0f3abd0e604e84e07d1f4e90feadcc176dcdde11 
>   ql/src/test/results/clientpositive/merge_join_1.q.out 
> 060562d8f967fb1217c83f6ac70f9d260b914050 
>   ql/src/test/results/clientpositive/perf/query17.q.out 
> 3692a9a1aef1aaa042d39521ba9ae07813d1f8fe 
>   ql/src/test/results/clientpositive/perf/query19.q.out 
> f5eeae4e2046ecbbeaa1b399ad5618028370a9bb 
>   ql/src/test/results/clientpositive/perf/query20.q.out 
> 3eed0a86592f69cf6660231f125273d2827bf842 
>   ql/src/test/results/clientpositive/perf/query25.q.out 
> 94b575fbc82d89ad86460a096726f8ed99e80227 
>   ql/src/test/results/clientpositive/perf/query29.q.out 
> c53d65df32f6573852b228811d72b8109352228a 
>   ql/src/test/results/clientpositive/perf/query3.q.out 
> 4260b1128788e2f93c7fc4509ccc012eeda074a0 
>   ql/src/test/results/clientpositive/perf/query39.q.out 
> 5c90a5371e3c6112ebb769a9dd034e92b6c2462f 
>   ql/src/test/results/clientpositive/perf/query40.q.out 
> c3c746b420a3cea007115083119ed432982f4676 
>   ql/src/test/results/clientpositive/perf/query45.q.out 
> 04f9b02b019b6cf591dee48964a73fdb4a4b285f 
>   ql/src/test/results/clientpositive/perf/query46.q.out 
> 22bf29b1aa90a24cd13cdadeb7905191a0184962 
>   ql/src/test/results/clientpositive/perf/query50.q.out 
> 781d3ec658146a180ff577ece3cb4cf54ff566ff 
>   ql/src/test/results/clientpositive/perf/query54.q.out 
> 76657a0ae1a086b61285dc52834f5d26d0bec059 
>   ql/src/test/results/clientpositive/perf/query64.q.out 
> 5c931c09368b3f28128ad65600759b2f36dce999 
>   ql/src/test/results/clientpositive/perf/query65.q.out 
> 41f52de18e600dfb879b7cc86ef48cbfea4f8630 
>   ql/src/test/results/clientpositive/perf/query68.q.out 
> 8fcc1477bad57b97f99ff0d5eacc50da66f7b0bc 
>   ql/src/test/results/clientpositive/perf/query71.q.out 
> ec67b154e7f22943ce307d2306a3aec64f642606 
>   ql/src/test/results/clientpositive/perf/query75.q.out 
> 4279f5457faf6b54cb2665e6d3f172fd1867db65 
>   ql/src/test/results/clientpositive/perf/query85.q.out 
> ca23bbbc1dc2115f70828c399888f2c7548b04b4 
>   ql/src/test/results/clientpositive/perf/query87.q.out 
> ee3090b0bd6455439d0fa93cfc4f105e21a9691c 
>   ql/src/test/results/clientpositive/perf/query89.q.out 
> c80f06cc54846a29077d9eaeb0e7744c825edf8f 
>   ql/src/test/results/clientpositive/perf/query92.q.out 
> f8186f51d91be15b3e6621db367eaa96ebbdcbd2 
>   ql/src/test/results/clientpositive/perf/query97.q.out 
> 5dcd341024edfba32d20e58a1da2df3dc6348454 
>   ql/src/test/results/clientpositive/perf/query98.q.out 
> 484f3c64606186735cea11fc4c12cc50a08e6b63 
>   ql/src/test/results/clientpositive/ptfgroupbyjoin.q.out 
> 7efc49292f2e3e96d0a58b78a623e6359008da15 
>   ql/src/test/results/clientpositive/reduce_deduplicate_extended2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/regex_col.q.out 
> c2a51d456cce6925b81f4eb9950a3bcc49fb308f 
>   ql/src/test/results/clientpositive/semijoin4.q.out 
> e557be149fe3b7926c85df30050d6570fa3e7bf5 
>   ql/src/test/results/clientpositive/semijoin5.q.out 
> 6ecc6935b9d9528e6f2c56a1bad18e7205646e51 
>   ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 
> 82ae244c8982735d912325189ab066aba31539f0 
>   ql/src/test/results/clientpositive/spark/dynamic_rdd_cache.q.out 
> b290fa1446d7e878a0e35cca65fd82fa565b6d47 
>   ql/src/test/results/clientpositive/spark/subquery_exists.q.out 
> 1ea5bb10e51e8161cd729674c0ed6fe5d8c53b39 
>   ql/src/test/results/clientpositive/spark/subquery_in.q.out 
> 5c72d1be220030c39a297adb57e4ed901cd2801b 
>   ql/src/test/results/clientpositive/spark/union25.q.out 
> a8b89d65d0fd1daf1470038e6fbf92c8617a2abc 
>   ql/src/test/results/clientpositive/spark/vector_outer_join3.q.out 
> 5a1453fcfc1de02db5e51d6f02754d6518f104fe 
>   ql/src/test/results/clientpositive/spark/vectorization_13.q.out 
> 21d0b8c458491df09072fd68be12ae21c4448333 
>   ql/src/test/results/clientpositive/spark/vectorization_14.q.out 
> cb3d9a4da84a379e00550ce7e31893b304d5e560 
>   ql/src/test/results/clientpositive/spark/vectorization_15.q.out 
> f826dc2d06b5d4cdea7e2b6b9d0737397fad5217 
>   ql/src/test/results/clientpositive/spark/vectorization_short_regress.q.out 
> ce6167765be56b7def10448e502fa72c35a4cb23 
>   ql/src/test/results/clientpositive/subquery_exists.q.out 
> 125d8f61c1e1dac2458705d6ce55ace8f0057d9d 
>   ql/src/test/results/clientpositive/subquery_in.q.out 
> 20c5538f0a4f0feb29b4f93790f2859b32416346 
>   ql/src/test/results/clientpositive/subquery_in_having.q.out 
> e623299f2cd5c4a299918deabcbb643c55ba19ea 
>   ql/src/test/results/clientpositive/subquery_notexists.q.out 
> 56f0387baa8f432f94d7cfec4aef177ec134c8af 
>   ql/src/test/results/clientpositive/subquery_notexists_having.q.out 
> b454a41db79af237c05643f979b6a248084428f7 
>   ql/src/test/results/clientpositive/subquery_notin.q.out 
> 6873aca06d911b4826dcf5f6961f9c09712cc91e 
>   ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out 
> 240bc3069ecb15ab04a7f3b197cc30900f37178c 
>   ql/src/test/results/clientpositive/subquery_views.q.out 
> 39f3927c7435713a4347bdbf9d4503a724a0da26 
>   ql/src/test/results/clientpositive/tez/explainuser_1.q.out 
> 1871c7e443cf775b09badc4cbf4b86e23ad9e525 
>   ql/src/test/results/clientpositive/tez/explainuser_2.q.out 
> 553066039881f225634c08d93a9054df5636e5d2 
>   ql/src/test/results/clientpositive/tez/subquery_exists.q.out 
> b281885c595542c2d7deca41a645a8b668620835 
>   ql/src/test/results/clientpositive/tez/subquery_in.q.out 
> a3e7833cbfea1de8c5f3f298fc66e0c4eace994e 
>   ql/src/test/results/clientpositive/tez/tez_union2.q.out 
> 9aa76dcfeeaa105cd5c1c3c7f005fac720489c88 
>   ql/src/test/results/clientpositive/tez/unionDistinct_1.q.out 
> ee330864ef0a455a834a1ebe92dd1263908dcf3e 
>   ql/src/test/results/clientpositive/tez/vector_groupby_reduce.q.out 
> 7f00b064e5a91b45282823e2725e11ab7f508b01 
>   ql/src/test/results/clientpositive/tez/vector_interval_mapjoin.q.out 
> 477516730fa242f7d6fdbd5be1a6ba5f2428bda5 
>   ql/src/test/results/clientpositive/tez/vector_outer_join3.q.out 
> b3437958e10b327c15b71ad5d12219090a75f7c7 
>   ql/src/test/results/clientpositive/tez/vectorization_13.q.out 
> 61776a51d3b9a1e27b7e42beae33bbe09d2604d9 
>   ql/src/test/results/clientpositive/tez/vectorization_14.q.out 
> 2a598332207f4540defa21a107642aa0502e1a58 
>   ql/src/test/results/clientpositive/tez/vectorization_15.q.out 
> a87f9d86cfb3831b2a420181da440ed336c7915c 
>   ql/src/test/results/clientpositive/tez/vectorization_short_regress.q.out 
> 7c2703faa6800652cb12ca9b927ff3ea1d65e893 
>   ql/src/test/results/clientpositive/union25.q.out 
> 6be39edcecb9c61de1e566bf66418297d1e6b3a5 
>   ql/src/test/results/clientpositive/unionDistinct_1.q.out 
> 0330133917511c355fca3adeb3e5d07eaeb3c0fc 
>   ql/src/test/results/clientpositive/vector_groupby_reduce.q.out 
> bc23b365b02b505d0f8e79cdacca3449bf46ead3 
>   ql/src/test/results/clientpositive/vector_interval_mapjoin.q.out 
> 2223e8144738777ab6352f6270f9275e5af6e827 
>   ql/src/test/results/clientpositive/vector_outer_join3.q.out 
> a0c4709232c2e08c90898170dd5c0b0abcbbed2c 
>   ql/src/test/results/clientpositive/vectorization_13.q.out 
> 8145bfd900a5fdefd4a3a82ee2e1d858a2a2dd53 
>   ql/src/test/results/clientpositive/vectorization_14.q.out 
> 6d4f13a23de5c184cd100af07ac19f24ba9fac4a 
>   ql/src/test/results/clientpositive/vectorization_15.q.out 
> 52630256def777f46f01667f0825fa33411235f4 
>   ql/src/test/results/clientpositive/vectorization_short_regress.q.out 
> 2a5562954095bd895ebdde077554a07a6982cf01 
> 
> Diff: https://reviews.apache.org/r/48500/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Reply via email to