> On May 19, 2016, 12:06 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java,
> >  line 556
> > <https://reviews.apache.org/r/47554/diff/1/?file=1387390#file1387390line556>
> >
> >     Can you also add comment why this is valid only for RS added by SPDO 
> > and not in general?
> 
> Jesús Camacho Rodríguez wrote:
>     I was thinking further about this, and indeed I made many assumptions 
> thinking that SPDO will not add the sort columns at the end of the additional 
> RS keys, so then we need to make sure the ordering is correct... but this is 
> not true, as it will add them.
>     
>     I think this optimization can be done general, as ordering before 
> ordering again is a noop. I have changed the code accordingly, I would like 
> your opinion. Am I missing something? Let's see what QA returns...
> 
> Ashutosh Chauhan wrote:
>     yeah.. thats what I think.. this is not specific to SPDO at logical 
> layer. 
>     Thing I am not 100% sure about is in FileSink the way SPDO is implemented 
> I *think* it assumes all rows for a particular key come in sorted and in one 
> batch. Now, when we merge and sort by two keys and if there is a case that FS 
> was expecting all rows sorted on second key only, now will get them sorted on 
> 2 keys and it may have to close ORC writer and then reopen again, as oppose 
> to write it once and never opening it again.
> 
> Jesús Camacho Rodríguez wrote:
>     I am trying to understand this part... The way the optimization works, we 
> will keep the second RS with its keys, we just remove the first RS... Thus, 
> this should not be a problem? Or maybe I am not understanding it correctly
> 
> Ashutosh Chauhan wrote:
>     I think you are correct. Since relevant FS is active in last reducer 
> containing RS which we is not changing, we are not changing any assumptions 
> for FS.

On a second thought I think there is more to it. e.g. we are transforming RS 
(a) followed by RS (a,b) to RS(a,b) Than this is not valid transformation in 
general, only if second RS is introduced by SPDO. e.g, if these two RS are part 
of two GBYs (or a join & GBy), than its invalid because partitioning on (a,b) 
!= partitioning on (a)


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47554/#review133844
-----------------------------------------------------------


On May 19, 2016, 10:49 a.m., Jesús Camacho Rodríguez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47554/
> -----------------------------------------------------------
> 
> (Updated May 19, 2016, 10:49 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-13750
>     https://issues.apache.org/jira/browse/HIVE-13750
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-13750
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
>  010c89ed978296709b052cc7bc80256a27658e2b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
>  733620b84657a21829248afe72ab16ad9692f37e 
>   ql/src/test/results/clientpositive/dynpart_sort_opt_vectorization.q.out 
> d03bfe422743d9a5a6b85f9a6198e1e27024f129 
>   ql/src/test/results/clientpositive/dynpart_sort_optimization.q.out 
> dec872ab0eef54bd92d5c2bc068e2805cc14e272 
>   ql/src/test/results/clientpositive/dynpart_sort_optimization_acid.q.out 
> 832580325873dee741ba86239ee571873994a808 
>   ql/src/test/results/clientpositive/reducesink_dedup.q.out 
> b89df52f965385b85894757896eee487b29c52ae 
>   ql/src/test/results/clientpositive/tez/dynpart_sort_opt_vectorization.q.out 
> a90e3f63b4646cf0ade9785a501ebd1a6b2a3406 
>   ql/src/test/results/clientpositive/tez/dynpart_sort_optimization.q.out 
> 723e8192f2735059005fc3c5c96732a2c4be49c1 
> 
> Diff: https://reviews.apache.org/r/47554/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Reply via email to