This possibly belongs in one of the new existing/open issues put up over the
past few days:
Insert 1000 rows with random row keys, and induce a split (see test.rb
attached to HBASE-1500). I would expect that no more than 1000 rows should
be returned from a row count. However, the following is a series of row
counts obtained after running the test, with total reinitialization in
between, 5 times:
1516
1492
1497
1509
1501
Also the shell provides an additional clue:
Current count: 1000, row: ffdcee2a75742697b375edef62fa4b75
1516 row(s) in 2.9530 seconds
Looks like the parent region is fully iterated first, then in addition
one of the daughters?
Also, as these issues come up, kindly consider adding test cases to the
test suite to catch these regressions. It seems the current coverage for
scanners is letting big issues pass unnoticed.
One thing we could do right away is commit my 'test.rb' reimplemented
as Java/JUnit into the suite, with some additional logic to test that
the scanners return the count of unique row keys inserted. If no -1 I
will go ahead and do that.
- Andy