TheR1sing3un opened a new issue, #17516: URL: https://github.com/apache/hudi/issues/17516
### Task Description **What needs to be done:** Reduce unnecessary `FSDataOutputStream#hsync` to enhance append performance. Before 1.x, the log file were allowed to be appended by different write transactions. Therefore, when we flush the data in the append handle, we need to persist the block data as much as possible to prevent the risk of data loss. Therefore, during each `flush`, `FSDataOutputStream#hsync` is called to allow datanodes to perform data flushing to the disk, and then the synchronization is carried out to wait for the request to complete before continuing with subsequent writes. <img width="992" height="510" alt="Image" src="https://github.com/user-attachments/assets/d674f480-7e32-42e9-acc9-86e6a6d25e8d" /> But after 1.x, we have already prohibited appending to log files. Therefore, a log file can be opened and written to by at most one write transaction, and the data of the entire log file should be visible after submission of the write transactions. Thus, performing `hsync` each time we `flush` is an unnecessary operation, and the cost of this operation is extremely high. Moreover, since our write is single-threaded, it will be blocked here, and subsequent writes will not be able to proceed until the request returns. So I suggest that performing `hsync` only once when closing the stream is sufficient. **Why this task is needed:** It has a significant impact on performance ### Task Type Performance optimization ### Related Issues **Parent feature issue:** (if applicable ) **Related issues:** NOTE: Use `Relationships` button to add parent/blocking issues after issue is created. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
