[
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457194#comment-13457194
]
Jean-Daniel Cryans commented on HBASE-6758:
-------------------------------------------
My understanding of this patch is that it reduces the race condition but it
still leaves a small window eg you can take the "fileNotInUse" snapshot, get
"false", and the moment after that the log could roll. If this is correct, I'm
not sure it's worth the added complexity.
It seems to me this is a case where we'd need to lock HLog.cacheFlushLock for
the time we read the log to be 100% sure log rolling doesn't happen. This has
multiple side effects like delaying flushes and log rolls for a few ms while
replication is reading the log. It would also require having a way to get to
the WAL from ReplicationSource.
<blue skying>While I'm thinking about this, it just occurred to me that when we
read a log that's not being written to then we don't need the open/close file
dance since the new data is already available. Possible optimization
here!</blue skying>
Anyways, one solution I can think of that doesn't involve leaking HRS into
replication would be giving the log a "second chance". Basically if you get an
EOF, flip the secondChance bit. If it's on then you don't get rid of that log
yet. Reset the bit when you loop back to read, now if there was new data added
you should get it else go to the next log.
> [replication] The replication-executor should make sure the file that it is
> replicating is closed before declaring success on that file
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
> Issue Type: Bug
> Reporter: Devaraj Das
> Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate
> since the file hasn't been closed yet. Upon closing, the new data becomes
> visible. Before that happens the ZK node shouldn't be deleted in
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made
> in ReplicationSource.processEndOfFile as well (currentPath related).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira