Re: Data upgrade from 0.89x to 0.90.0.

James Kennedy Fri, 11 Feb 2011 11:43:16 -0800

Right, but the 2.78hours [threadWakeFrequency (10000ms) x multiplier (1000ms)] 
check comes before a full compaction and from the code it looks like the NPE 
would have blocked any Store that had no timeRangeTracker from every getting a 
major compaction... unless it had been triggered in some other way.




James Kennedy
Project Manager
Troove Inc.

On 2011-02-10, at 11:23 PM, Ryan Rawson wrote:

> we only major compact a region ever 24 hours, therefore if it was JUST
> compacted within the last 24 hours we skip it.
> 
> this is how it used to work, and how it should still work, not really
> looking at code right now, busy elsewhere :-)
> 
> -ryan
> 
> On Thu, Feb 10, 2011 at 11:17 PM, James Kennedy
> <[email protected]> wrote:
>> Can you define 'come due'?
>> 
>> The NPE occurs at the first isMajorCompaction() test in the main loop of 
>> MajorCompactionChecker.
>> That cycle is executed every 2.78 hours.
>> Yet I know that I've kept healthy QA test data up and running for much 
>> longer than that.
>> 
>> 
>> James Kennedy
>> Project Manager
>> Troove Inc.
>> 
>> On 2011-02-10, at 10:46 PM, Ryan Rawson wrote:
>> 
>>> I am speaking off the hip here, but the major compaction algorithm
>>> attempts to keep the number of major compactions to a minimum by
>>> checking the timestamp of the file. So it's possible that the other
>>> regions just 'didnt come due' yet.
>>> 
>>> -ryan
>>> 
>>> On Thu, Feb 10, 2011 at 10:42 PM, James Kennedy
>>> <[email protected]> wrote:
>>>> I've tested HBase 0.90 + HBase-trx 0.90.0 and i've run it over old data 
>>>> from 0.89x using a variety of seeded unit test/QA data and cluster 
>>>> configurations.
>>>> 
>>>> But when it came time to upgrade some production data I got snagged on 
>>>> HBASE-3524. The gist of it is in Ryan's last points:
>>>> 
>>>> * compaction is "optional", meaning if it fails no data is lost, so you
>>>> should probably be fine.
>>>> 
>>>> * Older versions of the code did not write out time tracker data and
>>>> that is why your older files were giving you NPEs.
>>>> 
>>>> Makes sense.  But why did I not encounter this with my initial data 
>>>> upgrades on very similar data pkgs?
>>>> 
>>>> So I applied Ryan's patch, which simply assigns a default value 
>>>> (Long.MIN_VALUE) when a StoreFile lacks a timeRangeTracker and I "fixed" 
>>>> the data by forcing major compactions on the regions affected.  
>>>> Preliminary poking has not shown any instability in the data since.
>>>> 
>>>> But I confess that I just don't have the time right now to really dig into 
>>>> the code and validate that there are no more gotchya's or data corruption 
>>>> that could have resulted.
>>>> 
>>>> I guess the questions that I have for the team are:
>>>> 
>>>> * What state would 9 out of 50 tables be in to miss the new 0.90.0 
>>>> timeRangeTracker injection before the first major compaction check?
>>>> * Where else is the new TimeRangeTracker used?  Could a StoreFile with a 
>>>> null timeRangeTracker have corrupted the data in other subtler ways?
>>>> * What other upgrade-related data changes might not have completed 
>>>> elsewhere?
>>>> 
>>>> Thanks,
>>>> 
>>>> James Kennedy
>>>> Project Manage
>>>> Troove Inc.
>>>> 
>>>> 
>> 
>>

Re: Data upgrade from 0.89x to 0.90.0.

Reply via email to