Can you define 'come due'? The NPE occurs at the first isMajorCompaction() test in the main loop of MajorCompactionChecker. That cycle is executed every 2.78 hours. Yet I know that I've kept healthy QA test data up and running for much longer than that.
James Kennedy Project Manager Troove Inc. On 2011-02-10, at 10:46 PM, Ryan Rawson wrote: > I am speaking off the hip here, but the major compaction algorithm > attempts to keep the number of major compactions to a minimum by > checking the timestamp of the file. So it's possible that the other > regions just 'didnt come due' yet. > > -ryan > > On Thu, Feb 10, 2011 at 10:42 PM, James Kennedy > <[email protected]> wrote: >> I've tested HBase 0.90 + HBase-trx 0.90.0 and i've run it over old data from >> 0.89x using a variety of seeded unit test/QA data and cluster configurations. >> >> But when it came time to upgrade some production data I got snagged on >> HBASE-3524. The gist of it is in Ryan's last points: >> >> * compaction is "optional", meaning if it fails no data is lost, so you >> should probably be fine. >> >> * Older versions of the code did not write out time tracker data and >> that is why your older files were giving you NPEs. >> >> Makes sense. But why did I not encounter this with my initial data upgrades >> on very similar data pkgs? >> >> So I applied Ryan's patch, which simply assigns a default value >> (Long.MIN_VALUE) when a StoreFile lacks a timeRangeTracker and I "fixed" the >> data by forcing major compactions on the regions affected. Preliminary >> poking has not shown any instability in the data since. >> >> But I confess that I just don't have the time right now to really dig into >> the code and validate that there are no more gotchya's or data corruption >> that could have resulted. >> >> I guess the questions that I have for the team are: >> >> * What state would 9 out of 50 tables be in to miss the new 0.90.0 >> timeRangeTracker injection before the first major compaction check? >> * Where else is the new TimeRangeTracker used? Could a StoreFile with a >> null timeRangeTracker have corrupted the data in other subtler ways? >> * What other upgrade-related data changes might not have completed elsewhere? >> >> Thanks, >> >> James Kennedy >> Project Manage >> Troove Inc. >> >>
