yujun777 opened a new pull request, #54414:
URL: https://github.com/apache/doris/pull/54414

   ### What problem does this PR solve?
   
   for sql `random() > 10 and random() < 5`,  the two `random()` are different, 
 in order to deal with this case,  introduce a class UniqueScalarFunction,  
UniqueScalarFunction hold a unique id, when check two unique scalar function 
equal or not is just to compare their class types and the unique id.
   
   UniqueScalarFunction have four child classes:
   1. random;
   2. random_byte;
   3. uuid;
   4. uuid_byte;
   
   in most case,  two unique scalar function should treat as different and have 
different unique id,  but in aggregate, need to treat some unique scalar 
functions have the same unique id,  otherwise the aggregate will throw error. 
so need to bind their unqiue id to some other's unique id;
   
   here is the detail:
   /**
    * unique had a unique id, and two equals only if they have the same unique 
id.
    *
    * e.g. random(), uuid(), etc.
    *
    * for unique scalar functions in PROJECT/HAVING/QUALIFY/SORT/AGG 
OUTPUT/REPEAT OUTPUT,
    * if they have a related AGG plan, need bind their unique id to the matched 
AGG's group by unique scalar functions.
    * case as below:
    *
    * 1. if no aggregate or the aggregate group by expressions don't contain 
unique scalar function,
    *    then all the unique scalar function will be different.
    *    example i:  for sql 'select random1(), a + random2()
    *                         from t
    *                         order by random3(), a + random4()',
    *                since it doesn't contain aggregate, so random1(), 
random2(), random3(), random4() will be different
    *                and have different unique id;
    *
    *    example ii: for sql 'select sum(a + random1()), max(a + random2())
    *                         from t',
    *                will rewrite to 'select sum(a + random1()), max(a + 
random2())
    *                                 from t
    *                                 group by ()',
    *                since the aggregate's group by list is empty, so 
random1(), random2(), will be different.
    *
    *    example iii: for sql 'select a + random1(), sum(a + random2()), max(a 
+ random3())
    *                          from t
    *                          group by a',
    *                 since the group by list (a) not contain unique scalar 
function, so random1(), random2(), random3()
    *                 will be different.
    *
    * 2. if some aggregate group by expressions contains unique scalar 
function, then bind unique id to unique scalar
    *    function  with the longest matched group by.
    *    example: for sql 'select random1(), a + random2(), sum(random3()), 
sum(a + random4())
    *                      from t
    *                      where random5() > 0 and a + random6() > 0
    *                      group by random7(), random8(), a + random9(), a + 
random10()
    *                      order by random11(), abs(a + random12())'
    *              firstly, handle with the group by: if two group by can whole 
match, then their matched unique
    *                       scalar function will be equal and have the same 
unique id, so random7() equals with
    *                       random8(), random9() will be equals with 
random10(), but random7() not equals with random9.
    *              then handle with the PROJECT/HAVING/QUALIFY/SORT/AGG 
OUTPUT/REPEAT OUTPUT expressions, and we will have:
    *                      random1()/random3()/random11() are equal to 
random7(), and random2()/random4()/random12() are
    *                      equal to random9(), then update their unique id to 
the same.
    *              notice for FILTER random5()/random6(), they will be 
different with all other randoms.
    *
    * 3. if it's a distinct project and no contains aggregate functions and no 
contains aggregate plan.
    *    then it will rewrite to an aggregate, but will have some difference 
with the origin raw aggregate.
    *    example: for sql 'select distinct random1(), random2(), a + random3(), 
a + random4()
    *                      from t
    *                      order by random5(), a + random6()',
    *             it will rewrite to 'select random1(), random2(), a + 
random3(), a + random4()
    *                                 from t
    *                                 group by random1(), random2(), a + 
random3(), a + random5()
    *                                 order by random5(), a + random6()',
    *             for the rewritten aggregate,
    *             firstly, for the group by: the group by expressions will not 
try to match with each other even if
    *                      they seem look the same, so random1() will not equal 
to random2(),
    *                      random3() will not equal to random4(),
    *             then handle with the PROJECT/HAVING/QUALIFY/SORT, they will 
match with the first matched longest
    *                      group by expression, so random5() equals to random1 
but not equal to random2(),
    *                      random6() equals to  random3() but not equal to 
random5(), then update their unique id to
    *                      the same.
    *
    */
   
   ### Release note
   
   None
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to