we only major compact a region ever 24 hours, therefore if it was JUST
compacted within the last 24 hours we skip it.

this is how it used to work, and how it should still work, not really
looking at code right now, busy elsewhere :-)

-ryan

On Thu, Feb 10, 2011 at 11:17 PM, James Kennedy
<james.kenn...@troove.net> wrote:
> Can you define 'come due'?
>
> The NPE occurs at the first isMajorCompaction() test in the main loop of 
> MajorCompactionChecker.
> That cycle is executed every 2.78 hours.
> Yet I know that I've kept healthy QA test data up and running for much longer 
> than that.
>
>
> James Kennedy
> Project Manager
> Troove Inc.
>
> On 2011-02-10, at 10:46 PM, Ryan Rawson wrote:
>
>> I am speaking off the hip here, but the major compaction algorithm
>> attempts to keep the number of major compactions to a minimum by
>> checking the timestamp of the file. So it's possible that the other
>> regions just 'didnt come due' yet.
>>
>> -ryan
>>
>> On Thu, Feb 10, 2011 at 10:42 PM, James Kennedy
>> <james.kenn...@troove.net> wrote:
>>> I've tested HBase 0.90 + HBase-trx 0.90.0 and i've run it over old data 
>>> from 0.89x using a variety of seeded unit test/QA data and cluster 
>>> configurations.
>>>
>>> But when it came time to upgrade some production data I got snagged on 
>>> HBASE-3524. The gist of it is in Ryan's last points:
>>>
>>> * compaction is "optional", meaning if it fails no data is lost, so you
>>> should probably be fine.
>>>
>>> * Older versions of the code did not write out time tracker data and
>>> that is why your older files were giving you NPEs.
>>>
>>> Makes sense.  But why did I not encounter this with my initial data 
>>> upgrades on very similar data pkgs?
>>>
>>> So I applied Ryan's patch, which simply assigns a default value 
>>> (Long.MIN_VALUE) when a StoreFile lacks a timeRangeTracker and I "fixed" 
>>> the data by forcing major compactions on the regions affected.  Preliminary 
>>> poking has not shown any instability in the data since.
>>>
>>> But I confess that I just don't have the time right now to really dig into 
>>> the code and validate that there are no more gotchya's or data corruption 
>>> that could have resulted.
>>>
>>> I guess the questions that I have for the team are:
>>>
>>> * What state would 9 out of 50 tables be in to miss the new 0.90.0 
>>> timeRangeTracker injection before the first major compaction check?
>>> * Where else is the new TimeRangeTracker used?  Could a StoreFile with a 
>>> null timeRangeTracker have corrupted the data in other subtler ways?
>>> * What other upgrade-related data changes might not have completed 
>>> elsewhere?
>>>
>>> Thanks,
>>>
>>> James Kennedy
>>> Project Manage
>>> Troove Inc.
>>>
>>>
>
>

Reply via email to