hudi-bot opened a new issue, #15001:
URL: https://github.com/apache/hudi/issues/15001
Currently, when (async) Compaction for particular File Group has been
scheduled but not yet completed, if writer will try to append additional Log
Blocks to the same file-group following will occur:
# FileSystemView (when fetched), will check whether any compaction is
pending and if it's it will inject "phantom" (ie non-existent) log-file into
the existing FileSlice, which will have the same FileGroup name, but will bear
instant of the scheduled Compaction commit (on the timeline) in its name (as
opposed to the instant of the base-file)
# Writer will pick up such log-file as the latest
# Writer will write into such "phantom" log-file
[REF:
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java#L199|https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java#L199]
This posses following problems:
* Reader now has to be aware of such handling and therefore always include
pending compaction instants into its timeline when fetching the FileSystemView,
as otherwise it will miss newly added log-files.
* This pushes the decision-making point of where writes should be channeled
down into FileSystemView, which is clearly alien to its scope of
responsibilities.
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-3302
- Type: Task
---
## Comments
30/Jan/22 17:59;shivnarayan;[~alexey.kudinkin] : may I know if there is any
correctness issue or its about layering and abstractions. If correctness issue,
wanted to target for 0.11. If not, can you tag w/ 0.12 (fix version)
;;;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]