yujun777 opened a new pull request, #54414:
URL: https://github.com/apache/doris/pull/54414
### What problem does this PR solve?
for sql `random() > 10 and random() < 5`, the two `random()` are different,
in order to deal with this case, introduce a class UniqueScalarFunction,
UniqueScalarFunction hold a unique id, when check two unique scalar function
equal or not is just to compare their class types and the unique id.
UniqueScalarFunction have four child classes:
1. random;
2. random_byte;
3. uuid;
4. uuid_byte;
in most case, two unique scalar function should treat as different and have
different unique id, but in aggregate, need to treat some unique scalar
functions have the same unique id, otherwise the aggregate will throw error.
so need to bind their unqiue id to some other's unique id;
here is the detail:
/**
* unique had a unique id, and two equals only if they have the same unique
id.
*
* e.g. random(), uuid(), etc.
*
* for unique scalar functions in PROJECT/HAVING/QUALIFY/SORT/AGG
OUTPUT/REPEAT OUTPUT,
* if they have a related AGG plan, need bind their unique id to the matched
AGG's group by unique scalar functions.
* case as below:
*
* 1. if no aggregate or the aggregate group by expressions don't contain
unique scalar function,
* then all the unique scalar function will be different.
* example i: for sql 'select random1(), a + random2()
* from t
* order by random3(), a + random4()',
* since it doesn't contain aggregate, so random1(),
random2(), random3(), random4() will be different
* and have different unique id;
*
* example ii: for sql 'select sum(a + random1()), max(a + random2())
* from t',
* will rewrite to 'select sum(a + random1()), max(a +
random2())
* from t
* group by ()',
* since the aggregate's group by list is empty, so
random1(), random2(), will be different.
*
* example iii: for sql 'select a + random1(), sum(a + random2()), max(a
+ random3())
* from t
* group by a',
* since the group by list (a) not contain unique scalar
function, so random1(), random2(), random3()
* will be different.
*
* 2. if some aggregate group by expressions contains unique scalar
function, then bind unique id to unique scalar
* function with the longest matched group by.
* example: for sql 'select random1(), a + random2(), sum(random3()),
sum(a + random4())
* from t
* where random5() > 0 and a + random6() > 0
* group by random7(), random8(), a + random9(), a +
random10()
* order by random11(), abs(a + random12())'
* firstly, handle with the group by: if two group by can whole
match, then their matched unique
* scalar function will be equal and have the same
unique id, so random7() equals with
* random8(), random9() will be equals with
random10(), but random7() not equals with random9.
* then handle with the PROJECT/HAVING/QUALIFY/SORT/AGG
OUTPUT/REPEAT OUTPUT expressions, and we will have:
* random1()/random3()/random11() are equal to
random7(), and random2()/random4()/random12() are
* equal to random9(), then update their unique id to
the same.
* notice for FILTER random5()/random6(), they will be
different with all other randoms.
*
* 3. if it's a distinct project and no contains aggregate functions and no
contains aggregate plan.
* then it will rewrite to an aggregate, but will have some difference
with the origin raw aggregate.
* example: for sql 'select distinct random1(), random2(), a + random3(),
a + random4()
* from t
* order by random5(), a + random6()',
* it will rewrite to 'select random1(), random2(), a +
random3(), a + random4()
* from t
* group by random1(), random2(), a +
random3(), a + random5()
* order by random5(), a + random6()',
* for the rewritten aggregate,
* firstly, for the group by: the group by expressions will not
try to match with each other even if
* they seem look the same, so random1() will not equal
to random2(),
* random3() will not equal to random4(),
* then handle with the PROJECT/HAVING/QUALIFY/SORT, they will
match with the first matched longest
* group by expression, so random5() equals to random1
but not equal to random2(),
* random6() equals to random3() but not equal to
random5(), then update their unique id to
* the same.
*
*/
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]