+1 - Started up a 5-node clusterdock cluster (Hadoop 2.2.0, Oracle JDK 7u79) from binary tarballs. - Verified that the web UI works and that the HBase Version attribute matches the expected Git hash. - Ran ITBLL with 1 billion nodes and the serverKilling monkey (`clusterdock_ssh node-3.cluster hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList -m serverKilling loop 1 16 62500000 ${RANDOM} 16`), which passed.
-Dima On Tue, Sep 6, 2016 at 11:58 AM, Andrew Purtell <apurt...@apache.org> wrote: > Thanks for the +1, Heng. > > > TestThriftServer.beforeClass:97 » IO Shutting down > > Looks like the minicluster failed to launch. Port binding problem, perhaps? > It passes when rerun manually because probably no other test is executing > concurrently. By default our build runs unit tests with some parallelism. > FWIW this can be disabled with '-Dsurefire.firstPartForkCount=1 > -Dsurefire.secondPartForkCount=1'. > > Also, I use '-Dsurefire.rerunFailingTestsCount=2' to help distinguish > between failures and flakes. > > > On Tue, Sep 6, 2016 at 1:57 AM, Heng Chen <heng.chen.1...@gmail.com> > wrote: > > > +1 > > > > - Unpacked source and binary tarballs: layout looks good > > > > - Started up a 3-node cluster (Hadoop 2.7.2, Oracle JDK 8u20, 2 master, 3 > > rs) from binary tarballs. > > > > - Verified that the web UI works and shell works > > > > - build from source and run test case (JDK 8u20), passed. (There is some > > failed test case about thrift server, but could pass when rerun manually, > > list the failed test case below) > > > > TestThriftServer.beforeClass:97 » IO Shutting down > > > > TestThriftServerCmdLine.setUpBeforeClass:119 » IO Shutting down > > > > TestThriftHBaseServiceHandler.beforeClass:135 » IO Shutting down > > > > TestThriftHBaseServiceHandlerWithLabels.beforeClass:135 » IO > Shutting > > down > > > > - Run LTT with 1M rows (100 writers, 30 readers (100%), 10 updaters > > (20%)) all keys verified, no warns, no errors, no failed, latencies > lgtm > > > > - Run ITBLL with 2M rows (slowDeterministic), passed. > > > > - Run ITBLL with 2.5M rows (serverKilling), passed. > > > > Some notes: because 0.98 compiled with hadoop 2.2.0, so when i run > ITBLL > > on hadoop 2.7.2, it failed due to compatibiltiy issue, see HBASE-16564, > so > > i replace hadoop-2.2.0 jar with hadoop 2.5.1, and pass the ITBLL. Still > > give +1 because it is MapReduce issue not HBase > > > > > > > > > > 2016-09-05 13:41 GMT+08:00 Dima Spivak <dimaspi...@apache.org>: > > > > > Ugh, sorry guys, I'm dumb. I was running 1 mapper per RS before, but > > > switched to a d2.4xlarge instance today and, after noticing cores > sitting > > > idly, decided to try setting the number of mappers and reducers to the > > > number of cores to speed testing up (RAM is still grossly underutilized > > > with less than 16 GB/122 GB in use at any one time). This definitely > made > > > runs go faster (generation took less than 3 hours, verification took > > about > > > 1 hour), but I just realized that the number of nodes I picked > (62500000) > > > isn't a multiple of 25,000,000 and so the list won't wrap properly. > I'll > > > rerun and confirm, but I'm guessing this is a false alarm. > > > > > > Sorry again. :( > > > > > > -Dima > > > > > > On Sun, Sep 4, 2016 at 9:56 PM, Andrew Purtell < > andrew.purt...@gmail.com > > > > > > wrote: > > > > > > > I will also try your incantation (and JRE version) on this RC and > > 0.98.21 > > > > next week to answer those same questions. > > > > > > > > Looks like you are using a multiple of RSes (16) as numMappers? Is > that > > > > 4x? On what kind of instance type? I am (also, I think) using a 5 > node > > > > "cluster" with 4 RS nodes but numMappers 4 and numNodes 250000000. > > Since > > > > with clusterdock everything is contending for one instance's > resources > > I > > > > didn't want to overdo and so have started at 1 mapper per RS. Since > you > > > > appear to be using a higher value, I'm curious if you've found that > you > > > > will get stable results with that, if more mappers in this > > configuration > > > > does a better job finding problems in your experience, and what > > instance > > > > type are you using? I've been using a d2.4xlarge. > > > > > > > > > On Sep 4, 2016, at 9:04 PM, Andrew Purtell < > andrew.purt...@gmail.com > > > > > > > wrote: > > > > > > > > > > I've been running 1B tests with slowDeterministic. 0.98.21 and this > > > > 0.98.22 RC. I get 1B referenced, all ok. > > > > > > > > > > Did you run serverKilling with 0.98.21? And did it pass? Or does > > > 0.98.21 > > > > pass for you now? If so then we have a regression. If not then it's > > > > something to look at for 0.98.23 I'd say. > > > > > > > > > >> On Sep 4, 2016, at 8:44 PM, Dima Spivak <dimaspi...@apache.org> > > > wrote: > > > > >> > > > > >> Anyone else running ITBLL seeing issues? I just ran a 5-node > > > clusterdock > > > > >> cluster with JDK 7u79 of this RC and tried out ITBLL with 1 > billion > > > rows > > > > >> and the serverKilling monkey (`hbase > > > > >> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList -m > > > > serverKilling > > > > >> loop 1 16 62500000 ${RANDOM} 16`). This failed for me because of > > > > >> unreferenced list nodes: > > > > >> > > > > >> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$ > > > Verify$Counts > > > > >> REFERENCED=732006926 > > > > >> UNREFERENCED=12003580 > > > > >> > > > > >> Perhaps this is similar to what Mikhail saw a while back with > later > > > > >> releases? > > > > >> > > > > >> -Dima > > > > >> > > > > >>> On Sat, Sep 3, 2016 at 8:34 AM, Andrew Purtell < > > apurt...@apache.org> > > > > wrote: > > > > >>> > > > > >>> The 1st HBase 0.98.2 > > > > >>> 2 release candidate (RC0) is available for download at > > > > >>> https://dist.apache.org/repos/dist/dev/hbase/hbase-0.98.22RC0 > and > > > > Maven > > > > >>> artifacts are also available in the temporary repository > > > > >>> https://repository.apache.org/content/repositories/ > > > orgapachehbase-1151 > > > > . > > > > >>> > > > > >>> The detailed source and binary compatibility report for this > > release > > > > with > > > > >>> respect to the previous is available for your review at > > > > >>> https://dist.apache.org/repos/dist/dev/hbase/hbase-0.98. > > > > >>> 22RC0/0.98.21_0.98.22RC0_compat_report.html > > > > >>> . There are no reported compatibility issues. > > > > >>> > > > > >>> The > > > > >>> 25 > > > > >>> issues resolved in this release can be found at > > > > https://s.apache.org/C7SV > > > > >>> . > > > > >>> > > > > >>> I have made the following assessments of this candidate: > > > > >>> - Release audit check > > > > >>> : pass > > > > >>> > > > > >>> - > > > > >>> Unit test suite: pass 10/10 (7u79) > > > > >>> > > > > >>> - Loaded 1M keys with LTT (10 readers, 10 writers, 10 updaters > > (20%): > > > > all > > > > >>> keys verified, no unusual messages or errors, latencies in the > > > ballpark > > > > >>> - IntegrationTestBigLinkedList > > > > >>> 1B rows: 100% referenced, no errors (8u91) > > > > >>> - Built head of Apache Phoenix 4.x-HBase-0.98 branch > > > > >>> : > > > > >>> no errors (7u79) > > > > >>> > > > > >>> Signed with my code signing key D5365CCD. > > > > >>> > > > > >>> Please try out the candidate and vote +1/0/-1. This vote will be > > open > > > > for > > > > >>> at least 72 hours. Unless objection I will try to close it > > > > >>> Friday September 9, 2016 if we have sufficient votes. > > > > >>> > > > > >>> -- > > > > >>> Best regards, > > > > >>> > > > > >>> - Andy > > > > >>> > > > > >>> Problems worthy of attack prove their worth by hitting back. - > Piet > > > > Hein > > > > >>> (via Tom White) > > > > >>> > > > > > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >