[jira] Commented: (HBASE-2858) TestReplication.queueFailover fails half the time

HBase Review Board (JIRA) Wed, 21 Jul 2010 13:48:46 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890878#action_12890878
 ]


HBase Review Board commented on HBASE-2858:
-------------------------------------------

Message from: "Jean-Daniel Cryans" <[email protected]>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/349/
-----------------------------------------------------------

(Updated 2010-07-21 13:33:51.105842)


Review request for hbase.


Changes
-------

Updated diff, removed the logArchived method and, when a log is archived, we 
start reading from the new location right away.


Summary
-------

This patch fixes the ZKW.listZNodes issue and clears up a bit the path 
situation in ReplicationSource by removing a lock and adding wits to figure 
where the log is moved. The test now passes 100% of the time for me (up from 
50%).

There's one open issues as outlined by the two TODOS, what happens if a log is 
missing from HDFS? When the queue is recovered, it could mean that HDFS was 
cleared but not ZK, but during normal operations it would point to a bug? 
Report and continue?


This addresses bug HBASE-2858.
    http://issues.apache.org/jira/browse/HBASE-2858


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java e6b365e 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e1b4077 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LogActionsListener.java 
9b18992 
  
src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java 
a037aae 
  
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
 6b9dcb5 
  
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java
 1d7ae7e 
  
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
 2e13a0a 
  src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java e8dd268 
  
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogActionsListener.java
 d03809c 
  src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java 
ad9e8fb 
  src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java 
163671f 
  
src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
 bb09bc3 

Diff: http://review.hbase.org/r/349/diff


Testing
-------

Unit testing.


Thanks,

Jean-Daniel




> TestReplication.queueFailover fails half the time
> -------------------------------------------------
>
>                 Key: HBASE-2858
>                 URL: https://issues.apache.org/jira/browse/HBASE-2858
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.90.0
>
>
> TestReplication.queueFailover fails 50% of the time, it's because 
> ZooKeeperWrapper.listZnodes (introduced in HBASE-2694 and missed by 
> HBASE-2735) doesn't use the Watcher it's passed so sometimes 
> ReplicationSource misses hlogs to replicate for the region server we kill. 
> Also it uncovered an issue (while I was fixing the first one) that RepSource 
> ignores log files too quickly when the master is a bit too slow to split logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2858) TestReplication.queueFailover fails half the time

Reply via email to