Thanks, Josh. This issue should be squared away by the time you get back.
Have a great vacation!
- Doug
On Wed, Sep 17, 2008 at 4:47 PM, Joshua Taylor <[EMAIL PROTECTED]>wrote:
> (sorry premature send)
>
> Our current strategy is:
> 1. Store things in Hypertable until something crashes
> 2. Wipe out all the commit logs (losing some data)
> 3. Restart Hypertable
>
> This is ok for our purposes at the moment. We can treat Hypertable as a
> somewhat lossy large data store. :)
>
> Josh
>
>
>
> On Wed, Sep 17, 2008 at 4:46 PM, Joshua Taylor <[EMAIL PROTECTED]>wrote:
>
>> Hi Doug, thanks for the reply. I'm glad to hear of your upcoming update.
>> I'll put my investigation on hold for a while then (well, I have to anyway
>> since I'm going on vacation).
>>
>> Our current strategy is:
>> 1. Store things in Hypertable until something crashes
>> 2. Wipe out all the commit logs (losing some data)
>>
>>
>>
>> On Mon, Sep 15, 2008 at 9:32 PM, Doug Judd <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Josh,
>>>
>>> Thanks for investigating! Comments inline ...
>>>
>>> On Mon, Sep 15, 2008 at 4:41 PM, Joshua Taylor <[EMAIL PROTECTED]>wrote:
>>>
>>>> I've managed to load about 2-3 TB into a Hypertable cluster with 15
>>>> RangeServers. I'm still having problems with the RangeServers failing to
>>>> clean up their old logs.
>>>>
>>>> Server User commit log in MB
>>>> 10.10.1.194_38060 51512.8
>>>> 10.10.1.200_38060 48050.6
>>>> 10.10.1.199_38060 12464.7
>>>> 10.10.1.190_38060 90.8244
>>>> 10.10.1.193_38060 86.9587
>>>> 10.10.1.188_38060 70.4194
>>>> 10.10.1.191_38060 58.9111
>>>> 10.10.1.198_38060 48.7823
>>>> 10.10.1.192_38060 41.6324
>>>> 10.10.1.202_38060 34.0847
>>>> 10.10.1.189_38060 28.5852
>>>> 10.10.1.197_38060 22.3202
>>>> 10.10.1.196_38060 17.3054
>>>> 10.10.1.201_38060 14.2905
>>>> 10.10.1.195_38060 0
>>>>
>>>> No new updates have been added to this cluster in about 12 hours. Most
>>>> of the log directories look fine but a few have gigabytes of logs that
>>>> should be cleaned up.
>>>>
>>>> I've added some debug logging to try and isolate what's happening. For
>>>> each call to CommitLog::purge(), I'm logging the cutoff timestamp used.
>>>> I've also added messages to print out each of the remaining log fragments
>>>> after the purge breaks off:
>>>>
>>>> diff --git a/src/cc/Hypertable/Lib/CommitLog.cc
>>>> b/src/cc/Hypertable/Lib/CommitLog.cc
>>>> index a6b8920..ea0f894 100644
>>>> --- a/src/cc/Hypertable/Lib/CommitLog.cc
>>>> +++ b/src/cc/Hypertable/Lib/CommitLog.cc
>>>> @@ -232,6 +232,8 @@ int CommitLog::purge(uint64_t timestamp) {
>>>> CommitLogFileInfo file_info;
>>>> String fname;
>>>>
>>>> + HT_DEBUGF("purge: %llu", timestamp);
>>>> +
>>>> try {
>>>>
>>>> while (!m_fragment_queue.empty()) {
>>>> @@ -243,7 +245,16 @@ int CommitLog::purge(uint64_t timestamp) {
>>>> HT_INFOF("Removed log fragment file='%s' timestamp=%lld",
>>>> fname.c_str(), file_info.ti
>>>> }
>>>> else {
>>>> - //HT_INFOF("LOG FRAGMENT PURGE breaking because %lld >= %lld",
>>>> file_info.timestamp, t
>>>> + HT_DEBUGF("LOG FRAGMENT PURGE breaking because %lld >= %lld,
>>>> queue size = %llu",
>>>> + file_info.timestamp, timestamp,
>>>> m_fragment_queue.size());
>>>> + size_t i = 0;
>>>> + foreach(CommitLogFileInfo const & info, m_fragment_queue)
>>>> + {
>>>> + ++i;
>>>> + HT_DEBUGF(" Fragment %llu: time=%lld size=%llu name=%s%u",
>>>> + i, info.timestamp, info.size,
>>>> info.log_dir.c_str(),
>>>> + info.num);
>>>> + }
>>>> break;
>>>> }
>>>> }
>>>>
>>>> What I find is that the purge() function is still being called
>>>> regularly, but that the timestamp that gets passed in becomes stuck before
>>>> the oldest fragment and never changes after in later calls.
>>>>
>>>> Sometimes the timestamp threshold is just a few minutes earlier than the
>>>> oldest log's timestamp. Sometimes the passed in threshold value is 1,
>>>> which
>>>> seems like a suspicious number. Here's the first few lines of a recent
>>>> purge from the 52 GB server:
>>>>
>>>> 1221520163 DEBUG Hypertable.RangeServer :
>>>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:235) purge: 1
>>>> 1221520163 DEBUG Hypertable.RangeServer :
>>>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:249) LOG FRAGMENT
>>>> PURGE breaking because 1221271063903383000 >= 1, queue size = 831
>>>> 1221520163 DEBUG Hypertable.RangeServer :
>>>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment
>>>> 1:
>>>> time=1221271063903383000 size=67436344
>>>> name=/hypertable/servers/10.10.1.194_38060/log/user/3
>>>> 1221520163 DEBUG Hypertable.RangeServer :
>>>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment
>>>> 2:
>>>> time=1221271046265865000 size=896929
>>>> name=/hypertable/servers/10.10.1.193_38060/log/9F4D8D5249B24A2C52F06D2B/0
>>>> 1221520163 DEBUG Hypertable.RangeServer :
>>>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment
>>>> 3:
>>>> time=1221271374609078000 size=100133731
>>>> name=/hypertable/servers/10.10.1.194_38060/log/user/4
>>>>
>>>> Yes, the log is currently holding 831 fragments. Also I see compactions
>>>> are still happening on this server, but they start and finish in the same
>>>> second.
>>>>
>>>> Here's a purge from the 48 GB server:
>>>>
>>>> 1221520459 DEBUG Hypertable.RangeServer :
>>>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:235) purge:
>>>> 1221327270724907001
>>>> 1221520459 DEBUG Hypertable.RangeServer :
>>>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:249) LOG FRAGMENT
>>>> PURGE breaking because 1221327332648604000 >= 1221327270724907001, queue
>>>> size = 622
>>>> 1221520459 DEBUG Hypertable.RangeServer :
>>>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment
>>>> 1:
>>>> time=1221327332648604000 size=100006176
>>>> name=/hypertable/servers/10.10.1.200_38060/log/user/916
>>>> 1221520459 DEBUG Hypertable.RangeServer :
>>>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment
>>>> 2:
>>>> time=1221327395726285000 size=100147244
>>>> name=/hypertable/servers/10.10.1.200_38060/log/user/917
>>>> 1221520459 DEBUG Hypertable.RangeServer :
>>>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment
>>>> 3:
>>>> time=1221327503701781000 size=100179860
>>>> name=/hypertable/servers/10.10.1.200_38060/log/user/918
>>>>
>>>> This one has 622 fragments. The last logged compaction on this server
>>>> was 25 hours ago, though it apparently did some non-trivial work in that
>>>> time. At least, the begin and end compaction log messages have some time
>>>> between them.
>>>>
>>>
>>> This definitely sheds light on the problem. I think I've figured out how
>>> the purge timestamp can get set to 1. It's set by the following line in
>>> AccessGroup::run_compaction() :
>>>
>>> m_oldest_cached_timestamp = (m_cell_cache_ptr->size() > 0) ?
>>> timestamp.real + 1 : 0;
>>>
>>> If after a split, the split-off portion has a few updates in the split
>>> log, but never receives any updates therafter, then the range could get
>>> compacted with timestamp.real == 0 causing the above line to set
>>> "m_oldest_cached_timestamp" to 1. I'll need to investigate a little further
>>> to figure out the other case you point out.
>>>
>>>
>>>> The log fragments appear to be in increasing time order, though a lot of
>>>> the fragments seem to be mostly empty. For example, on the 70 MB server,
>>>> the first fragment contains 70 MB of data followed by 353 fragments with
>>>> size=101.
>>>>
>>>
>>> The issue here is that when a range splits, the first thing that happens
>>> is a split log gets installed. The split log is for capturing any updates
>>> to the portion of the range that is getting split off. When the split
>>> finally completes, the RangeServer that receives the newly split off portion
>>> will be notified with the name of the split log and will re-play it after it
>>> loads the range. After it finishes replaying the split log, it will link
>>> the split log into the range's primary commit log. This is just a commit
>>> log entry that contains the name of the log that is to be linked in. When
>>> it does this, it rolls the log. So, in this "degenerate" primary-key order
>>> insert, you have a bunch of split logs that basically contain nothing being
>>> linked into commit logs that are not receiving any commits. That's why you
>>> have a bunch of tiny log fragments, then just contain a single link entry.
>>> There's a program called 'dumplog' that you can use to dump out a commit
>>> log. Point it to the log directory and it should print out a summary of the
>>> entries.
>>>
>>>
>>>> I've looked around a bit at where these purge thresholds come from.
>>>> When the purge starts, the code tries to get the oldest cached timestamp
>>>> from each access group, then initiates a purge of logs older than that.
>>>> This timestamp seems to be the thing getting stuck (or set to 1), so tried
>>>> to find out where that gets set. I think it's set in each access group
>>>> after a compaction to the timestamp of the compaction request (plus 1).
>>>> Where the compaction timestamp comes from is a bit of a mystery to me. The
>>>> range seems to schedule the compactions either from m_state.timestamp or
>>>> m_timestamp, depending on whether the compaction is initiated in
>>>> Range::run_compaction or Range::split_compact_and_shrink. Also, I don't
>>>> understand what's going on with Timestamp.real vs. Timestamp.logical. The
>>>> flow of this code is just a bit hard for me to track. ;)
>>>>
>>>
>>> This code is definitely questionable and obviously broken. I've spent
>>> the past 1 1/2 weeks overhauling all of this logic based on two of your
>>> suggestions:
>>>
>>> 1. Freeze and copy the CellCache during a compaction
>>> 2. Separate the timestamp from the revision number
>>>
>>> I'm nearly finished with this overhaul, but have a few more systems tests
>>> that I need to get passing. I hope to have this checked in within two or
>>> three days. These changes have cleaned up the code tremendously. Now, the
>>> m_oldest_cached_timestamp variable gets set with each call to
>>> AccessGroup::add() and does not get set with the line above. Plus, I've
>>> combined add() and replay_add() into just add().
>>>
>>> Anyway, that's as far as I've managed to go on this problem so far. I
>>>> was wondering if anyone on the list who understands this code better might
>>>> be able to look at the problem description and say, "A-ha! I know what's
>>>> going wrong here!" (And hopefully issue a patch.) Otherwise, I'll
>>>> continue
>>>> unravelling a bit at a time.
>>>>
>>>
>>> I should have the new code checked in within a few days, which should
>>> take care of these problems, so I wouldn't bother spending too much time on
>>> this. Thanks for all of your help.
>>>
>>> - Doug
>>>
>>>
>>>
>>>
>>
>
> >
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---