[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457194#comment-13457194
 ] 

Jean-Daniel Cryans commented on HBASE-6758:
-------------------------------------------

My understanding of this patch is that it reduces the race condition but it 
still leaves a small window eg you can take the "fileNotInUse" snapshot, get 
"false", and the moment after that the log could roll. If this is correct, I'm 
not sure it's worth the added complexity.

It seems to me this is a case where we'd need to lock HLog.cacheFlushLock for 
the time we read the log to be 100% sure log rolling doesn't happen. This has 
multiple side effects like delaying flushes and log rolls for a few ms while 
replication is reading the log. It would also require having a way to get to 
the WAL from ReplicationSource.

<blue skying>While I'm thinking about this, it just occurred to me that when we 
read a log that's not being written to then we don't need the open/close file 
dance since the new data is already available. Possible optimization 
here!</blue skying>

Anyways, one solution I can think of that doesn't involve leaking HRS into 
replication would be giving the log a "second chance". Basically if you get an 
EOF, flip the secondChance bit. If it's on then you don't get rid of that log 
yet. Reset the bit when you loop back to read, now if there was new data added 
you should get it else go to the next log.
                
> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6758
>                 URL: https://issues.apache.org/jira/browse/HBASE-6758
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>         Attachments: 6758-1-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to