[ https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859759#comment-15859759 ]
Kenneth Howe commented on GEODE-2398: ------------------------------------- This problem occurred writing to the channel from within the method Oplog.flush(OplogFile olf, boolean doSync). There is also a channel write executed from within Oplog.flush(OplogFile olf, ByteBuffer b1, ByteBuffer b2). The second form of flush calls channel.write(ByteBuffer[] bbArray) instead of channel.write(ByteBuffer bb) as in the first form. Since the write has been seen to fail in the first form, there's presumably a remote chance of a similar failure in the second form. The fix for this problem is to add a retry loop around the channel.write calls conditional on the number of bytes written returned by write() being consistent with the change in ByteBuffer positions. The number of retries is limited to a small number to prevent a hard failure causing a thread to hang. IOException is thrown if the retry limit is exceeded. > Sporadic Oplog corruption due to channel.write failure > ------------------------------------------------------ > > Key: GEODE-2398 > URL: https://issues.apache.org/jira/browse/GEODE-2398 > Project: Geode > Issue Type: Bug > Components: persistence > Reporter: Kenneth Howe > Assignee: Kenneth Howe > > There have been some occurrences of Oplog corruption during testing that have > been traced to failures in writing oplog entries to the .crf file. When it > fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The > call to channel.write(bb) method returns 0 bytes written, but the source > ByteBuffer position is moved to the ByteBuffer limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)