Ted, My question really boils down this this -- can we have a data loss if we don't take in HBASE-10142 or not? It will take me a little more time reverse engineer and convince myself one way or another. If we can lose data, then I'll roll take the port and do another rc. If we don't (we are just less performant) then since we have 4 binding +1's other than me I'll release.
I either case, I want to understand how it fixes the problem. I can see the changes e.g. the re-entrant lock, and the shifting out of checkLowReplication call, but I need to figure out why the changes were necessary. The main substantive change is that the lock is taken around the checkLowReplication call but I haven't put together why this fixes the problem. Can you shed more light (and ideally lay out why the changes fix the problem in jira? -- maybe a bad multithread trace that is fixed by the new lock?) Thanks, Jon. On Thu, Dec 19, 2013 at 9:01 AM, Ted Yu <[email protected]> wrote: > Jon: > If Stack gives the greenlight, I can certainly port it to 0.96 branch. > > Cheers > > > On Thu, Dec 19, 2013 at 8:39 AM, Jonathan Hsieh <[email protected]> wrote: > > > When I run the test standalone, and didn't have an failure in > 0.96.1.1rc0 > > or 0.96.1. When I ran the whole suite, I ran into exactly the same > failure > > on 0.96.1.1 (currnetly testing full suite from 0.96.1 src tar ball) > > > > I've spent some time reviewing HBASE-10142, There are some non-test code > > modifications still trying to determine if it is a serious problem or not > > on that side. Ted, is there a reason why this wasn't ported to the 0.96 > > branch? > > > > Jon. > > > > > > On Wed, Dec 18, 2013 at 9:00 PM, Jean-Marc Spaggiari < > > [email protected]> wrote: > > > > > Typo is because I have no done a cut&past ;) > > > > > > With cut&past: mvn test -PrunAllTests > -Dsurefire.secondPartThreadCount=8 > > > > > > On the last run, error is: > > > Failed tests: > > > > > > > > > testLogRollOnDatanodeDeath(org.apache.hadoop.hbase.regionserver.wal.TestLogRolling): > > > LowReplication Roller should've been disabled, current replication=1 > > > > > > I did not keep the errors from the previous runs... > > > > > > > > > 2013/12/18 Ted Yu <[email protected]> > > > > > > > What error(s) did you see ? > > > > > > > > There was a typo in this def ('s' between D and u): > > > > -Dsurefire.secondPartThreadCount > > > > > > > > You can lower the thread count: in trunk build, value of 2 is used. > > > > > > > > Cheers > > > > > > > > > > > > On Wed, Dec 18, 2013 at 8:45 PM, Jean-Marc Spaggiari < > > > > [email protected]> wrote: > > > > > > > > > I tried multiple times over many hours to run: > > > > > mvn test -PrunallTests -Dusefire.secondPartThreadCount=8 > > > > > > > > > > On a local machine using the src jar, with no success. I might be > > > missing > > > > > something... I will investigate so I will be able to provide better > > > > > feedback for 0.96.2... > > > > > > > > > > Sorry about that. > > > > > > > > > > > > > > > 2013/12/18 Enis Söztutar <[email protected]> > > > > > > > > > > > +1. > > > > > > > > > > > > - downloaded the artifacts > > > > > > - checked checksums > > > > > > - checked sigs > > > > > > - checked hadoop libs in h1 / h2 > > > > > > - checked directory layouts > > > > > > - run local cluster > > > > > > - run smoke tests with shell on the artifacts > > > > > > - run tests locally: > > > > > > -- bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write > > > > 10:10:100 > > > > > > -num_keys 1000000 -read 100:30 > > > > > > -- bin/hbase > > > > "org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList > > > > > > Loop 1 1 3000000 /tmp/biglinkedlist 1" > > > > > > > > > > > > > > > > > > On Wed, Dec 18, 2013 at 12:45 PM, Stack <[email protected]> > wrote: > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > Downloaded, unbundled, checked layout, and ran it. Browsed the > > mvn > > > > > > > artifacts. > > > > > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > > > On Tue, Dec 17, 2013 at 2:23 PM, Jonathan Hsieh < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > This is a quick-fix release directly off of the 0.96.1 > release. > > > > It > > > > > > can > > > > > > > be > > > > > > > > downloaded here: > > > > > > > > > > > > > > > > http://people.apache.org/~jmhsieh/hbase-0.96.1.1rc0/ > > > > > > > > > > > > > > > > The maven staging repo is here: > > > > > > > > > > > > > > > > > > > > > > > https://repository.apache.org/content/repositories/orgapachehbase-056/ > > > > > > > > > > > > > > > > There is only one jira'ed patch in this release, HBASE-10188 > > [1]. > > > > > This > > > > > > > > fixes an API incompatibility introduced between 0.96.0 and > > > 0.96.1. > > > > > > Other > > > > > > > > changes include updates to CHANGES.txt (to include that > jira), > > > and > > > > > > > pom.xml > > > > > > > > (naming the release 0.96.1.1). > > > > > > > > > > > > > > > > As such, we'll have an abridged testing and voting period. > > > Please > > > > > > have > > > > > > > a > > > > > > > > quick look and vote +1/-1 by 12/18/13 23:59 pacific time. If > > this > > > > > > passes > > > > > > > > we'll take down 0.96.1. > > > > > > > > > > > > > > > > Please check the release mechanics -- this is my first > attempt > > at > > > > an > > > > > > > hbase > > > > > > > > release. > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/HBASE-10188 > > > > > > > > > > > > > > > > -- > > > > > > > > // Jonathan Hsieh (shay) > > > > > > > > // Software Engineer, Cloudera > > > > > > > > // [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > // Jonathan Hsieh (shay) > > // Software Engineer, Cloudera > > // [email protected] > > > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [email protected]
