TheR1sing3un opened a new issue, #17516:
URL: https://github.com/apache/hudi/issues/17516

   ### Task Description
   
   **What needs to be done:**
   Reduce unnecessary `FSDataOutputStream#hsync` to enhance append performance.
   
   Before 1.x, the log file were allowed to be appended by different write 
transactions. Therefore, when we flush the data in the append handle, we need 
to persist the block data as much as possible to prevent the risk of data loss. 
   Therefore, during each `flush`, `FSDataOutputStream#hsync` is called to 
allow datanodes to perform data flushing to the disk, and then the 
synchronization is carried out to wait for the request to complete before 
continuing with subsequent writes.
   
   <img width="992" height="510" alt="Image" 
src="https://github.com/user-attachments/assets/d674f480-7e32-42e9-acc9-86e6a6d25e8d";
 />
   
   But after 1.x, we have already prohibited appending to log files.
   Therefore, a log file can be opened and written to by at most one write 
transaction, and the data of the entire log file should be visible after 
submission of the write transactions.
   Thus, performing `hsync` each time we `flush` is an unnecessary operation, 
and the cost of this operation is extremely high. Moreover, since our write is 
single-threaded, it will be blocked here, and subsequent writes will not be 
able to proceed until the request returns.
   
   So I suggest that performing `hsync` only once when closing the stream is 
sufficient.
   
   **Why this task is needed:**
   
   It has a significant impact on performance
   
   ### Task Type
   
   Performance optimization
   
   ### Related Issues
   
   **Parent feature issue:** (if applicable )
   **Related issues:**
   NOTE: Use `Relationships` button to add parent/blocking issues after issue 
is created.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to