kasakrisz opened a new pull request, #4427:
URL: https://github.com/apache/hive/pull/4427
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
2. Ensure that you have created an issue on the Hive project JIRA:
https://issues.apache.org/jira/projects/HIVE/summary
3. Ensure you have added or run the appropriate tests for your PR:
4. If the PR is unfinished, add '[WIP]' in your PR title, e.g.,
'[WIP]HIVE-XXXXX: Your PR title ...'.
5. Be sure to keep the PR description updated to reflect all changes.
6. Please write your PR title to summarize what this PR proposes.
7. If possible, provide a concise example to reproduce the issue for a
faster review.
-->
### What changes were proposed in this pull request?
The incremental MV rebuild plan in presence of delete operations is based on
right outer joining the delta result set produced by the MV definition query to
the MV.
https://github.com/apache/hive/blob/02851615a2f4ae3fced4edec76c7c4a06f6f63c1/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveJoinInsertDeleteIncrementalRewritingRule.java#L49-L51
Alter this CBO plan generation by projecting a flag on the MV side and use
it in the filter of the delete branch when transforming the plan into a multi
insert at the optimized AST level.
### Why are the changes needed?
1. Let assume that the MV definition has a join. When records with `join_key
= x` is deleted the from only one of the joined tables the result records are
also deleted from the MV at rebuild time. Then the records with the same key
are deleted the next MV rebuild tries to delete the result records again from
the MV but the `ROW_ID` of these records in the MV are no longer available and
we end up with `NULL` as `ROW_ID` which leads to NPE.
The reason why these deleted records are appear in the delta is that MV
rebuild fetches deleted rows too.
2. The MV rebuild plan have to contain a `Project` on top of the MV scan in
order to project the `ROW_ID`. The `ROW_ID` can not be projected this way in
the Calcite plan because it is not referenced in parent operators and further
optimizations would remove the project. So adding the flag in this project and
reference it in the filter condition prevents removal.
### Does this PR introduce _any_ user-facing change?
No exception is thrown in such cases.
### Is the change a dependency upgrade?
No.
### How was this patch tested?
```
mvn test -Dtest.output.overwrite -Dtest=TestMiniLlapLocalCliDriver
-Dqfile=materialized_view_repeated_rebuild.q,materialized_view_join_rebuild.q,materialized_view_create_rewrite_5.q
-pl itests/qtest -Pitests
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]