guanziyue commented on code in PR #4913:
URL: https://github.com/apache/hudi/pull/4913#discussion_r1299270668


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##########
@@ -273,4 +280,31 @@ protected static Option<IndexedRecord> 
toAvroRecord(HoodieRecord record, Schema
       return Option.empty();
     }
   }
+
+  protected class AppendLogWriteCallback implements HoodieLogFileWriteCallback 
{
+    // here we distinguish log files created from log files being appended. 
Considering following scenario:
+    // An appending task write to log file.
+    // (1) append to existing file file_instant_writetoken1.log.1
+    // (2) rollover and create file file_instant_writetoken2.log.2
+    // Then this task failed and retry by a new task.
+    // (3) append to existing file file_instant_writetoken1.log.1
+    // (4) rollover and create file file_instant_writetoken3.log.2
+    // finally file_instant_writetoken2.log.2 should not be committed to hudi, 
we use marker file to delete it.
+    // keep in mind that log file is not always fail-safe unless it never roll 
over
+

Review Comment:
   > oh, I get it. in hdfs like systems, if we are using direct style markers, 
if two diff writers try to append to same log file (either sequentially or 
conrrently), we will be calling append type marker for the same log file more 
than once. and direct style markers will fail since the marker file already 
exists. is my understanding correct? did we make any fix on this end or not 
yet? I mean, I underststand we have reverted this patch in latest master. but 
for MDT purpose, I am looking to see if we can re-add this patch (per log file 
marker).So, trying to understand any gaps or failures we need to handle before 
we can add per log file marker support.
   
   Yes! you are correct. This was finally fixed by 
https://github.com/apache/hudi/pull/9003/files. 
   Unfortunately, this PR was reverted due to another failure. W/o MDT, 
FileSystem Based FileSystemView can actually 'see' some uncommitted files, like 
log files being written. And according to current FileGroup definition, an 
uncommitted log file is considered valid as long as it has a committed base 
instant time. Such an uncommitted file should be correctly addressed in reading 
because hudi can find that the instant time in log block read from this log 
file is invalid.
   However, with this PR, we may delete an invalid log file when commit is 
going to finish while a reading job may require this file existing.
   In theory, such an error should not occur with MDT because MDT will not show 
this file until it is committed. For FileSystem Based FileSystemView, I failed 
to have an idea to fix this with a short time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to