[jira] [Commented] (HBASE-3515) [replication] ReplicationSource can miss a log after RS comes out of GC

Jean-Daniel Cryans (Commented) (JIRA) Tue, 25 Oct 2011 16:22:57 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135545#comment-13135545
 ]


Jean-Daniel Cryans commented on HBASE-3515:
-------------------------------------------

To reiterate the problem, it's possible to not be able to add an HLog to 
replicate if the session is expired when log rolling. HLog currently doesn't 
get any feedback from the WALActionListeners, even if they fail at doing their 
job.

One way of fixing it would be to throw an exception and stop the log rolling, 
but it means that if there's many listeners that some may already have 
processed the adding of the log. We could also kill the region server plain and 
simple if it happens.

I'm in favor of the latter.
                
> [replication] ReplicationSource can miss a log after RS comes out of GC
> -----------------------------------------------------------------------
>
>                 Key: HBASE-3515
>                 URL: https://issues.apache.org/jira/browse/HBASE-3515
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3515.patch
>
>
> This is from Hudson build 1738, if a log is about to be rolled and the ZK 
> connection is already closed then the replication code will fail at adding 
> the new log in ZK but the log will still be rolled and it's possible that 
> some edits will make it in.
> From the log:
> {quote}
> 2011-02-08 10:21:20,618 FATAL 
> [RegionServer:0;vesta.apache.org,46117,1297160399378.logRoller] 
> regionserver.HRegionServer(1383):
>  ABORTING region server serverName=vesta.apache.org,46117,1297160399378, 
> load=(requests=1525, regions=12,
>  usedHeap=273, maxHeap=1244): Failed add log to list
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
>  
> /1/replication/rs/vesta.apache.org,46117,1297160399378/2/vesta.apache.org%3A46117.1297160480509
> ...
> 2011-02-08 10:21:22,444 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(258):
>  Splitting hlog 8 of 8: 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509,
>  length=0
> 2011-02-08 10:21:22,862 DEBUG 
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0] 
> wal.HLogSplitter(436):
>  Pushed=31 entries from 
> hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509
> {quote}
> The easiest thing to do would be let the exception out and cancel the log 
> roll.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3515) [replication] ReplicationSource can miss a log after RS comes out of GC

Reply via email to