Just to wrap this up -- HBASE-4570 was filed and more details are there. There is also a small patch that addresses byt symptom described in this scenario. It ended up being a race with rowname caching in the KV store.
This isn't related to flushes, which has become the focus of HBASE-4485 and HBASE-2856's subtasks. I however should be part of making HBASE-2856 (TestAcidGuarantees) pass consistently. I've tested it on trunk and a version of 0.90.x Jon. On Mon, Oct 10, 2011 at 5:03 PM, Stack <[email protected]> wrote: > Keep digging Jon. > St.Ack > > On Mon, Oct 10, 2011 at 3:55 PM, Jonathan Hsieh <[email protected]> wrote: > > Yup, I'm pretty convinced this is a consistency and isolation problem. > > Here's the output of my program, (expected 59 rows) and some output from > > problem rows. > > > > The row has 10 column families ("f0","f1"... "f9") with two cols. There > are > > actually two 'row0000021332' rows -- the 48 and the 49 versions. Each > came > > from a separate calls to next(). Notice how they are split between > column > > families! Over many runs this happens many times although "splitting" > > between different column families. > > > > scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows > > scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows > > scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows > > scan items counted: 60 with 0 null ColumnFamilyMaps and 0 nullRows > > Row row0000021332 had time stamps: [48: > > keyvalues={row0000021332/f0:data/1318200437341/Put/vlen=1000, > > row0000021332/f0:qual/1318200437341/Put/vlen=10, > > row0000021332/f1:data/1318200437341/Put/vlen=1000, > > row0000021332/f1:qual/1318200437341/Put/vlen=10, > > row0000021332/f2:data/1318200437341/Put/vlen=1000, > > row0000021332/f2:qual/1318200437341/Put/vlen=10, > > row0000021332/f3:data/1318200437341/Put/vlen=1000, > > row0000021332/f3:qual/1318200437341/Put/vlen=10, > > row0000021332/f4:data/1318200437341/Put/vlen=1000, > > row0000021332/f4:qual/1318200437341/Put/vlen=10, > > row0000021332/f5:data/1318200437341/Put/vlen=1000, > > row0000021332/f5:qual/1318200437341/Put/vlen=10}, 49: > > keyvalues={row0000021332/f6:data/1318200437341/Put/vlen=1000, > > row0000021332/f6:qual/1318200437341/Put/vlen=10, > > row0000021332/f7:data/1318200437341/Put/vlen=1000, > > row0000021332/f7:qual/1318200437341/Put/vlen=10, > > row0000021332/f8:data/1318200437341/Put/vlen=1000, > > row0000021332/f8:qual/1318200437341/Put/vlen=10, > > row0000021332/f9:data/1318200437341/Put/vlen=1000, > > row0000021332/f9:qual/1318200437341/Put/vlen=10}] > > > > Jon. > > > > On Mon, Oct 10, 2011 at 1:50 PM, Jonathan Hsieh <[email protected]> > wrote: > > > >> I've modified scenario 2's filter to setFilterIfMissing(true) which I > >> missed in scenario 1 and now have an undercount like in that scenario. > In > >> another experiement, I found that getting the latest version > >> Record.getColumnLatest(..) would return null. > >> > >> This seems like a consistency+isolation violation of the expected > row-acid > >> properties. > >> > >> ---- > >> Scan s = new Scan(); > >> ... > >> SingleColumnValueFilter filter = new SingleColumnValueFilter( > >> Bytes.toBytes("f1"), Bytes.toBytes("qual"), CompareOp.EQUAL, > >> value); > >> filter.setFilterIfMissing(true); // this is what drops events. > >> s.setFilter(filter); > >> --- > >> > >> Jon. > >> > >> On Mon, Oct 10, 2011 at 12:52 PM, Jonathan Hsieh <[email protected]> > wrote: > >> > >>> I've working a problem related to ACID rules on scans as defined here > >>> http://hbase.apache.org/acid-semantics.html. > >>> > >>> In the two scenarios no new rowkeys are written -- all writes are > >>> "overwrites" of already existing rows. So lets say the table has 3 > rows: > >>> A1, B1, C1; and overwrites rows with values A2,B2,C2 respectively. > I'd > >>> expect all the scans would have the same number of elements (3) but > with > >>> potentially any combination of A1|A2, B1|B2, C1|C2. (since no acid > >>> guarantees across rows). > >>> > >>> Scenario 1: I have an MR job that does a filtered scan (confirmed this > >>> problem happens without filter as well) from a table, takes the row > and > >>> writes back to the same row and the same table. I then run the job > 2-3x > >>> concurrently on the same table with the same filter. I believe both > should > >>> return the same number of elements read. It seems that in the case of > a > >>> multiple column family table, this is not the case -- in particular > >>> sometimes MR counters reports that it had *fewer* than than the > expected > >>> number of input records (but never more). This seems wrong. Agree? > >>> > >>> Scenario 2: When trying to duplicate the problem excluding the MR > >>> portions, I wrote two programs -- one that does a filtered scan and > that > >>> overwrites of existing rows, and one that just does the same filtered > scan > >>> that just counts the number rows read. I've actually also used the > >>> TableRecordReaderImpl code that the MR TableInputFormat uses. In this > case, > >>> sometimes the scan/counting job actually returns *too many* entries -- > for > >>> some rowkeys return two records. (but never fewer). This should > probably > >>> not happen as well. Agree? > >>> > >>> I've observed this on 0.90.1+patches and 0.90.3+patches. It was also > >>> claimed that this issue was not seen in a 0.89.x based hbase. Thus far > I've > >>> been able to reproduce this on multiple-column-family tables. > >>> > >>> I believe scenario 1's problem is related to these. Concur? > >>> https://issues.apache.org/jira/browse/HBASE-2856 > >>> https://issues.apache.org/jira/browse/HBASE-3498 > >>> > >>> I think scenario 2 is probably related but not sure if it is the same > >>> issue. > >>> > >>> Are there other related JIRA's? > >>> > >>> Any hints for where to hunt this down? (I'm starting to go into the > scan > >>> and write paths on the RegionServer, but this is a fairly large area... > >>> > >>> Thanks > >>> Jon. > >>> > >>> -- > >>> // Jonathan Hsieh (shay) > >>> // Software Engineer, Cloudera > >>> // [email protected] > >>> > >>> > >>> > >> > >> > >> -- > >> // Jonathan Hsieh (shay) > >> // Software Engineer, Cloudera > >> // [email protected] > >> > >> > >> > > > > > > -- > > // Jonathan Hsieh (shay) > > // Software Engineer, Cloudera > > // [email protected] > > > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [email protected]
