The previous four builds of our 0.90 based variant were ok, but recent changes to replication seem to be problematic. The major difference between our version and upstream is use of secure RPC.
We also see a possibly related problem on the 0.90 branch on Hudson (https://hudson.apache.org/hudson/job/HBase-0.90) >>> java.lang.AssertionError: Waited too much time for queueFailover replication at org.junit.Assert.fail(Assert.java:91) at org.apache.hadoop.hbase.replication.TestReplication.queueFailover(TestReplication.java:560) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) <<< >From our Hudson: Changes HBASE-3360 ReplicationLogCleaner is enabled by default in 0.90 -- causes NPE (detail) HBASE-3363 ReplicationSink should batch delete (detail) HBASE-3365 EOFE contacting crashed RS causes Master abort (detail) Adding a fix for this test that was missing (detail) (Prior to this change set previous four runs were ok.) First result: >>> Running org.apache.hadoop.hbase.replication.TestReplication killed. [HUDSON] Recording test results [INFO] ------------------------------------------------------------------------ [ERROR] BUILD ERROR [INFO] ------------------------------------------------------------------------ [INFO] Error while executing forked tests.; nested exception is org.apache.maven.surefire.booter.shade.org.codehaus.plexus.util.cli.CommandLineException: Error while executing external command, process killed. Process timeout out after 900 seconds <<< Next: >>> ------------------------------------------------------------------------------- Test set: org.apache.hadoop.hbase.replication.TestReplication ------------------------------------------------------------------------------- Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 272.887 sec <<< FAILURE! testAddAndRemoveClusters(org.apache.hadoop.hbase.replication.TestReplication) Time elapsed: 24.792 sec <<< FAILURE! java.lang.AssertionError: Waited too much time for put replication at org.junit.Assert.fail(Assert.java:91) at org.apache.hadoop.hbase.replication.TestReplication.testAddAndRemoveClusters(TestReplication.java:390) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) ... <<< Best regards, - Andy