Even though I had EC2 agitating my tablet servers for me, the reason the test failed was a failed recovery due to lease recovery failure. This is the same thing Keith saw with his CI test. We did make small changes to the lease recovery code (dropped Hadoop 1 support), but I don't know why that would make a difference.
The fact that we're seeing this problem in multiple tests is troubling. I'm continuing to investigate the issue. -Eric On Fri, May 15, 2015 at 12:09 PM, Eric Newton <eric.new...@gmail.com> wrote: > I've been running a RW test w/out agitation on a 21 node EC2 cluster. > > It died in the replication test. > > So I restarted it with one walker, just doing replication. > > Half the cluster died with zookeeper timeouts. I am investigating. > > I'll pull the source, build and review the results by EOB today. > > -Eric > > > On Thu, May 14, 2015 at 1:38 PM, Josh Elser <josh.el...@gmail.com> wrote: > >> Reminder, ~27hrs remaining on this vote. 5pm ET tmrw. >> >> I know some people are running tests. Please pull down the source >> tarball, run unit tests, build the code, poke at it locally. Any and all >> feedback is warmly welcomed. The more eyes we have looking at this the >> better. Thanks in advance. >> >> >> Josh Elser wrote: >> >>> You are correct. I forgot to update the SHA when I copied the contents. >>> >>> The correct SHA1=8cba8128fbc3238bdd9398cf5c36b7cb6dc3b61d >>> >>> Christopher wrote: >>> >>>> The SHA1 seems incorrect. The jars says they were built on >>>> 8cba8128fbc3238bdd9398cf5c36b7cb6dc3b61d, which I can't find in the RC >>>> branch. >>>> >>>> -- >>>> Christopher L Tubbs II >>>> http://gravatar.com/ctubbsii >>>> >>>> >>>> On Tue, May 12, 2015 at 3:08 PM, Josh Elser<josh.el...@gmail.com> >>>> wrote: >>>> >>>>> Devs, >>>>> >>>>> Please consider the following candidate for Apache Accumulo 1.7.0 >>>>> >>>>> Tag: 1.7.0-rc3 >>>>> SHA1: 76634fb2f1257abbb8ef745ea67a4f78e733a402 >>>>> Staging Repository: >>>>> >>>>> https://repository.apache.org/content/repositories/orgapacheaccumulo-1032 >>>>> >>>>> >>>>> Source tarball: >>>>> >>>>> https://repository.apache.org/content/repositories/orgapacheaccumulo-1032/org/apache/accumulo/accumulo/1.7.0/accumulo-1.7.0-src.tar.gz >>>>> >>>>> Binary tarball: >>>>> >>>>> https://repository.apache.org/content/repositories/orgapacheaccumulo-1032/org/apache/accumulo/accumulo/1.7.0/accumulo-1.7.0-bin.tar.gz >>>>> >>>>> (Append ".sha1", ".md5" or ".asc" to download the signature/hash for an >>>>> artifact.) >>>>> >>>>> Signing keys available at: https://www.apache.org/dist/accumulo/KEYS >>>>> >>>>> 1.7.0 includes 693 resolved issues: >>>>> >>>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12312121&version=12324607 >>>>> >>>>> >>>>> Testing: All unit and integration tests are passing. Completed 3-day >>>>> CI w/o >>>>> agitation or verification. Completed 24hr RandomWalk w/o agitation on 3 >>>>> nodes. 95% through 24hr CI and RW w/ agitation. >>>>> >>>>> API compatibility report for 1.6.2 to 1.7.0: >>>>> >>>>> http://people.apache.org/~elserj/accumulo-1.7.0-rc3/1.6.2_to_1.7.0/compat_report.html >>>>> >>>>> >>>>> API backwards compatibility report for 1.7.0 to 1.6.2: >>>>> >>>>> http://people.apache.org/~elserj/accumulo-1.7.0-rc3/1.7.0_to_1.6.2/compat_report.html >>>>> >>>>> >>>>> The vote will be open for 72hrs until Friday, May 16th 4:00PM ET. >>>>> Here's my >>>>> +1. >>>>> >>>> >