hudi-bot opened a new issue, #15001:
URL: https://github.com/apache/hudi/issues/15001

   Currently, when (async) Compaction for particular File Group has been 
scheduled but not yet completed, if writer will try to append additional Log 
Blocks to the same file-group following will occur:
    # FileSystemView (when fetched), will check whether any compaction is 
pending and if it's it will inject "phantom" (ie non-existent) log-file into 
the existing FileSlice, which will have the same FileGroup name, but will bear 
instant of the scheduled Compaction commit (on the timeline) in its name (as 
opposed to the instant of the base-file)
    # Writer will pick up such log-file as the latest
    # Writer will write into such "phantom" log-file
   
   [REF: 
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java#L199|https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java#L199]
   
    
   
   This posses following problems: 
    * Reader now has to be aware of such handling and therefore always include 
pending compaction instants into its timeline when fetching the FileSystemView, 
as otherwise it will miss newly added log-files.
    * This pushes the decision-making point of where writes should be channeled 
down into FileSystemView, which is clearly alien to its scope of 
responsibilities.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-3302
   - Type: Task
   
   
   ---
   
   
   ## Comments
   
   30/Jan/22 17:59;shivnarayan;[~alexey.kudinkin] : may I know if there is any 
correctness issue or its about layering and abstractions. If correctness issue, 
wanted to target for 0.11. If not, can you tag w/ 0.12 (fix version)
   
    ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to