[ 
https://issues.apache.org/jira/browse/GEODE-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859759#comment-15859759
 ] 

Kenneth Howe commented on GEODE-2398:
-------------------------------------

This problem occurred writing to the channel from within the method 
Oplog.flush(OplogFile olf, boolean doSync). There is also a channel write 
executed from within Oplog.flush(OplogFile olf, ByteBuffer b1, ByteBuffer b2). 
The second form of flush calls channel.write(ByteBuffer[] bbArray) instead of 
channel.write(ByteBuffer bb) as in the first form. Since the write has been 
seen to fail in the first form, there's presumably a remote chance of a similar 
failure in the second form.

The fix for this problem is to add a retry loop around the channel.write calls 
conditional on the number of bytes written returned by write() being consistent 
with the change in ByteBuffer positions. The number of retries is limited to a 
small number to prevent a hard failure causing a thread to hang. IOException is 
thrown if the retry limit is exceeded.

> Sporadic Oplog corruption due to channel.write failure
> ------------------------------------------------------
>
>                 Key: GEODE-2398
>                 URL: https://issues.apache.org/jira/browse/GEODE-2398
>             Project: Geode
>          Issue Type: Bug
>          Components: persistence
>            Reporter: Kenneth Howe
>            Assignee: Kenneth Howe
>
> There have been some occurrences of Oplog corruption during testing that have 
> been traced to failures in writing oplog entries to the .crf file. When it 
> fails, Oplog.flush attempts to write a ByteBuffer to the file channel. The 
> call to channel.write(bb) method returns 0 bytes written, but the source 
> ByteBuffer position is moved to the ByteBuffer limit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to