[ 
https://issues.apache.org/jira/browse/IGNITE-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546821#comment-16546821
 ] 

Ivan Rakov commented on IGNITE-8761:
------------------------------------

[~v.pyatkov], I've looked through your changes. A few comments:
1) Seems like WalSegmentSyncer mimics behavior of 
FileWriteAheadLogManager#flush. Can we just call external flush from fsyncer 
thread?
2) Current implementation stops WalSegmentSyncer through interruption. Please 
look at shutdown mechanism of other WAL threads.
3) Potential StorageException during fsync is not handled
4) Why we need to use ConcurrentLinkedHashMap here? I don't see how it's 
related to the original issue.

> WAL fsync at rollover should be asynchronous in LOG_ONLY and BACKGROUND modes
> -----------------------------------------------------------------------------
>
>                 Key: IGNITE-8761
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8761
>             Project: Ignite
>          Issue Type: Improvement
>          Components: persistence
>            Reporter: Ivan Rakov
>            Assignee: Vladislav Pyatkov
>            Priority: Major
>             Fix For: 2.7
>
>
> Transactions may periodically hang for a few seconds in LOG_ONLY or 
> BACKGROUND persistent modes. Thread dumps show that threads are hanging on 
> syncing previous WAL segment during rollover:
> {noformat}
>   java.lang.Thread.State: RUNNABLE
>    at java.nio.MappedByteBuffer.force0(MappedByteBuffer.java:-1)
>    at java.nio.MappedByteBuffer.force(MappedByteBuffer.java:203)
>    at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.close(FileWriteAheadLogManager.java:2843)
>    at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$600(FileWriteAheadLogManager.java:2483)
>    at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.rollOver(FileWriteAheadLogManager.java:1094)
> {noformat}
> Waiting for this fsync is not necessary action to ensure crash recovery 
> guarantees. Instead of this, we should just perform fsyncs asychronously and 
> ensure that they are completed prior to next checkpoint start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to