[hypertable-dev] Re: How to reburn RangeServer ??

kuer Fri, 24 Jul 2009 00:21:08 -0700

Hi, Sanjit,

there some code in patch :


diff --git a/src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc b/
src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc
index 4c408e3..6c55cde 100644
--- a/src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc
+++ b/src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc
@@ -86,9 +86,16 @@ void MaintenanceScheduler::schedule() {
    * Purge commit log fragments
    */
   {
-    int64_t revision_root     = TIMESTAMP_MAX;
-    int64_t revision_metadata = TIMESTAMP_MAX;
-    int64_t revision_user     = TIMESTAMP_MAX;
+    int64_t revision_user;
+    int64_t revision_metadata;
+    int64_t revision_root;
+
+    (Global::user_log !=0) ?
+        revision_user = Global::user_log->get_latest_revision() :
TIMESTAMP_MIN;
+    (Global::metadata_log !=0) ?
+        revision_metadata = Global::metadata_log->get_latest_revision
() : TIMESTAMP_MIN;
+    (Global::root_log !=0) ?
+        revision_root = Global::root_log->get_latest_revision() :
TIMESTAMP_MIN;

I think they are wrong. they should be :

+    revision_user =  (Global::user_log !=0) ?
+        Global::user_log->get_latest_revision() : TIMESTAMP_MIN;
+    revision_metadata = (Global::metadata_log !=0) ?
+        Global::metadata_log->get_latest_revision() : TIMESTAMP_MIN;
+    revision_root = (Global::root_log !=0) ?
+        Global::root_log->get_latest_revision() : TIMESTAMP_MIN;

Is that right???

 -- kuer


On 7月23日, 下午10时18分, Sanjit Jhala <[email protected]> wrote:
> Hi Kuer,
>
> As I mentioned before, the real issue here is why the logging access  
> group in the METADATA table is being compacted. That seems to be the  
> root of any corruption. Coming to the issue of commit log replay, we  
> have a couple of log replay fixes in the soon to be released 0.9.2.5  
> release. You can apply the attached patches to see if that solves the  
> problem.
>
> -Sanjit
>
>  0001-Fixed-bug-in-CommitLogReader-caused-by-fragment-queu.patch
> 6K查看下载
>
>
>
>  0002-Fixed-bug-in-CommitLog-that-was-causing-some-fragmen.patch
> 5K查看下载
>
>
>
> On Jul 22, 2009, at 6:20 PM, kuer wrote:
>
>
>
> > Hi, Sanjit,
>
> > I have patched the HT_INFOF() in AccessGroup.cc.
>
> > But, I modified CellStoreV1.cc to make sure m_trailer.num_filter_items
> >> 0 when creating BloomFilter. I think this value is something like
> > capacity of bloomfilter, so enlarging it will not make much trouble.
>
> > After modifying and launching, the rangeserver cannot finish log-
> > replaying.  It seemed that the modified version rangeserver has
> > destroy something. This time "RANGE SERVER range not found".
>
> > I post it in another post:
> >http://groups.google.com/group/hypertable-dev/browse_thread/thread/fd...
>
> > Thanks
>
> >   -- kuer
>
> > On 7月23日, 上午7时38分, Sanjit Jhala <[email protected]> wrote:
> >> Hi Kuer,
>
> >> I suspect the BloomFilter code is fine. Taking a look at your logs it
> >> looks like the METADATA table is going through a split and the
> >> AccessGroup "logging" is undergoing a major compaction, however since
> >> it is empty, nothing gets inserted in BloomFilterItems and hence the
> >> assert gets hit.
> >> (From the log:
> >> 2009-07-22 09:39:01,415 1351514432 Hypertable.RangeServer [INFO]
> >> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of
> >> METADATA[0:<FF> <FF>..<FF><FF>](logging))
>
> >> This AccessGroup is currently not used by the system and so it should
> >> be empty and should not undergo a compaction. Can you make the change
> >> (below) to AccessGroup.cc so we can have a better idea of why its
> >> compacting? I suspect it might be a memory corruption issue (that we
> >> have a fix  for in the upcoming release).
>
> >> It would be great if you ran the RangeServer with valgrind turned on
> >> and see if the valgrind log reveals anything further (to do this add
> >> the option --valgrind-rangeserver when calling the start-all-
> >> servers.sh script (eg: <$HYPERTABLE_INSTALL_DIR/bin/start-all-
> >> servers.sh --valgrind-rangeserver > ))
>
> >> -Sanjit
>
> >> --- a/src/cc/Hypertable/RangeServer/AccessGroup.cc
> >> +++ b/src/cc/Hypertable/RangeServer/AccessGroup.cc
> >> @@ -375,8 +375,8 @@ void AccessGroup::run_compaction(bool major) {
> >>           if (m_immutable_cache->memory_used()==0 && m_stores.size()
> >> <= (size_t)1)
> >>             HT_THROW(Error::OK, "");
> >>           tableidx = 0;
> >> -        HT_INFOF("Starting Major Compaction of %s(%s)",
> >> -                 m_range_name.c_str(), m_name.c_str());
> >> +        HT_INFOF("Starting Major Compaction of %s(%s) immutable  
> >> cache
> >> mem=%llu, num cell stores=%d",
> >> +                 m_range_name.c_str(), m_name.c_str(),
> >> m_immutable_cache->memory_used(), m_stores.size());
> >>         }
> >>         else {
> >>           if (m_stores.size() >
> >> (size_t)Global::access_group_max_files) {
>
> >> On Jul 22, 2009, at 4:11 AM, kuer wrote:
>
> >>> Hi, all,
>
> >>> I find something interesting in cc/Hypertable/RangeServer/
> >>> CellStoreV1.cc :
>
> >>> 168   if (m_bloom_filter_mode != BLOOM_FILTER_DISABLED) {
> >>> 169     m_bloom_filter_items = new BloomFilterItems(); //  
> >>> aproximator
> >>> items
> >>> 170   }
>
> >>> 367
> >>> 368   // if bloom_items haven't been spilled to create a bloom  
> >>> filter
> >>> yet, do it
> >>> 369   if (m_bloom_filter_mode != BLOOM_FILTER_DISABLED) {
> >>> 370     if (m_bloom_filter_items) {
> >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>> I think this cannot promise m_bloom_filter_items->size() > 0
>
> >>> 371       m_trailer.num_filter_items = m_bloom_filter_items->size();
> >>> ^^^^^ How about adding the following lines ???
> >>> +
> >>> +           if (m_trailer.num_filter_items <  1 ) {
> >>> +               m_trailer.num_filter_items = m_max_entries;
> >>> +           }
> >>> +           if (m_trailer.num_filter_items < 1) {
> >>> +               m_trailer.num_filter_items = 1;
> >>> +           }
> >>> +
> >>> 372       create_bloom_filter();
> >>> 373     }
> >>> 374     assert(!m_bloom_filter_items && m_bloom_filter);
> >>> 375
> >>> 376     m_bloom_filter->serialize(send_buf);
> >>> 377     m_filesys->append(m_fd, send_buf, 0, &m_sync_handler);
> >>> 378
> >>> 379     m_outstanding_appends++;
> >>> 380     m_offset += m_bloom_filter->size();
> >>> 381   }
> >>> 382
>
> >>> thanks
>
> >>> -- kuer
>
> >>> On 7月22日, 下午5时05分, kuer <[email protected]> wrote:
> >>>> Hi, all,
>
> >>>> the content of the file that cause assertion failure of  
> >>>> BloomFilter :
>
> >>>> /hypertable/tables/METADATA/logging/AB2A0D28DE6B77FFDD6C72AF/cs0
>
> >>>> $ hexdump -C cs0
> >>>> 00000000  49 64 78 46 69 78 2d 2d  2d 2d 1a 00 ff ff ff ff  |
> >>>> IdxFix----......|
> >>>> 00000010  00 00 00 00 00 00 00 00  7d 9f 49 64 78 56 61 72
> >>>> |........}.IdxVar|
> >>>> 00000020  2d 2d 2d 2d 1a 00 ff ff  ff ff 00 00 00 00 00 00
> >>>> |----............|
> >>>> 00000030  00 00 87 97                                       |....|
> >>>> 00000034
>
> >>>>  FYI
>
> >>>>    -- kuer
>
> >>>> On 7月22日, 下午1时03分, Sanjit Jhala <[email protected]>
> >>>> wrote:
>
> >>>>>   Recovering ranges from crashed RangeServers is one of the high
> >>>>> priority items Doug is working on.
>
> >>>>> -Sanjit
> > 

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[hypertable-dev] Re: How to reburn RangeServer ??

Reply via email to