Hi, all,

the content of the file that cause assertion failure of BloomFilter :

/hypertable/tables/METADATA/logging/AB2A0D28DE6B77FFDD6C72AF/cs0

$ hexdump -C cs0
00000000  49 64 78 46 69 78 2d 2d  2d 2d 1a 00 ff ff ff ff  |
IdxFix----......|
00000010  00 00 00 00 00 00 00 00  7d 9f 49 64 78 56 61 72
|........}.IdxVar|
00000020  2d 2d 2d 2d 1a 00 ff ff  ff ff 00 00 00 00 00 00
|----............|
00000030  00 00 87 97                                       |....|
00000034

 FYI

   -- kuer


On 7月22日, 下午1时03分, Sanjit Jhala <[email protected]> wrote:
>   Recovering ranges from crashed RangeServers is one of the high  
> priority items Doug is working on.
>
> -Sanjit
>
> On Jul 21, 2009, at 7:59 PM, kuer wrote:
>
>
>
> > Hi, all,
>
> > Another question,  as one of range-servers will coredump when
> > replaying commit log, so I just stop rebooting it. But this time, the
> > whole HT system seems stop working, too.
>
> > Client program complain socket.timeout,
>
> > hyperspace shell hangs :
> > hypertable> show tables;
> > METADATA
> > kvcache
> > storage_se
>
> >  Elapsed time:  0.00 s
> > hypertable> show create table storage_se;
> > ^^^^^ waiting for .... ????
>
> > Logging messages from Hypertable.Master :
>
> > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR]
> > (AsyncComm/Comm.cc:212) No connection for 221.194.134.173:31060
> > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [WARN] (Lib/
> > RangeServerClient.cc:312) Comm::send_request to 221.194.134.173:31060
> > failed - COMM not connected
> > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR]
> > find_range_and_start_scan (Lib/IntervalScanner.cc:408):
> > Hypertable::Exception: Comm::send_request to 221.194.134.173:31060
> > failed - COMM not connected
> >    at void Hypertable::RangeServerClient::send_message(const
> > sockaddr_in&, Hypertable::CommBufPtr&, Hypertable::DispatchHandler*)
> > (Lib/RangeServerClient.cc:314)
> > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR] (Master/
> > MasterGc.cc:239) Error: caught exception while gc'ing: Problem
> > creating scanner on METADATA[..0: ]
>
> > NOTES: 221.194.134.173 is IP of the box where RangeServer went wrong.
>
> > My question is :
> > since all information are shared by all rangeserver, why not
> > hypertable.master reassign the ranges to other rangeserver when some
> > of rangeservers go out of work ???
>
> > thanks
>
> >   -- kuer
>
> > On 7月22日, 上午10时43分, kuer <[email protected]> wrote:
> >> Hi, Sanjit,
>
> >> I just upload the second part of range.log  range.20090722.log.
> >> 2.gz。
>
> >> the first part of range.20090722.log.1.gz is about 18MB, it exceed  
> >> the
> >> limits of upload files.
>
> >>http://hypertable-dev.googlegroups.com/web/range.20090722.log.2.gz?
> >> gd...
>
> >> IF it is necessary, I will split the first log file and upload them.
>
> >> Thanks
>
> >>   -- kuer
>
> >> On 7月22日, 上午10时15分, Sanjit Jhala <[email protected]>  
> >> wrote:
>
> >>> Hi Kuer,
>
> >>> You can gzip the RangeServer log and post them to the File Upload
> >>> Page. Thanks for reporting this issue.
>
> >>> -Sanjit
>
> >>> On Jul 21, 2009, at 6:44 PM, kuer wrote:
>
> >>>> Hi, Sanjit,
>
> >>>> with --debug option, I get some logging message, but the file is  
> >>>> big,
> >>>> how to share it with you?
>
> >>>> gdb backtrace of core files
>
> >>>> (gdb) bt
> >>>> #0  0x0000000000538272 in
> >>>> Hypertable
> >>>> ::BasicBloomFilter<Hypertable::MurmurHash2>::BasicBloomFilter
> >>>> ()
> >>>> #1  0x000000000053d3be in  
> >>>> Hypertable::CellStoreV1::create_bloom_filter
> >>>> ()
> >>>> #2  0x000000000053e10e in Hypertable::CellStoreV1::finalize ()
> >>>> #3  0x000000000051f112 in Hypertable::AccessGroup::run_compaction  
> >>>> ()
> >>>> #4  0x0000000000504e45 in  
> >>>> Hypertable::Range::split_compact_and_shrink
> >>>> ()
> >>>> #5  0x0000000000509310 in Hypertable::Range::split ()
> >>>> #6  0x00000000004ec693 in
> >>>> Hypertable::MaintenanceQueue::Worker::operator() ()
> >>>> #7  0x00000000006a5c40 in thread_proxy ()
> >>>> #8  0x00000038ae406367 in start_thread () from /lib64/
> >>>> libpthread.so.0
> >>>> #9  0x00000038ad8d2f7d in clone () from /lib64/libc.so.6
>
> >>>> -- kuer
>
> >>>> On 7月22日, 上午9时07分, Sanjit Jhala <[email protected]>  
> >>>> wrote:
> >>>>> Hi Kuer,
>
> >>>>> This looks like a bug in the RangeServer code. The RangeServer is
> >>>>> trying to create a CellStore file and while creating the  
> >>>>> CellStore's
> >>>>> BloomFilter its hitting an error condition.
>
> >>>>> Can you try a couple of things to help debug this issue?
>
> >>>>> Firstly turn on the RangeServer debug logging and report  
> >>>>> RangeServer
> >>>>> logs. You can do this by adding the global option --debug to your
> >>>>> start-all-servers.sh command line. Example: <
> >>>>> $HYPERTABLE_INSTALL_DIR>/
> >>>>> bin/start-all-servers.sh kfs --debug
>
> >>>>> Secondly, if you could compile a debug build and send the stack  
> >>>>> trace
> >>>>> that would be helpful. To do this, from your hypertable build
> >>>>> directory run
> >>>>> ccmake <$HYPERTABLE_SRC_DIR> and make  sure CMAKE_BUILD_TYPE is  
> >>>>> set
> >>>>> to
> >>>>> Debug and install the new build. After you try to bring up the
> >>>>> RangeServer and it dumps core, you can load the core file in gdb  
> >>>>> (Eg:
> >>>>> gdb gdb <$HYPERTABLE_INSTALL_DIR>/bin/Hypertable.RangeServer <
> >>>>> $CORE_FILE>). You can run bt (backtrace) in gdb to get the stack
> >>>>> trace.
>
> >>>>> -Sanjit
>
> >>>>> On Jul 21, 2009, at 5:36 PM, kuer wrote:
>
> >>>>>> Hi, all,
>
> >>>>>> one of RangeServers hangs after coredump and restarting . here  
> >>>>>> are
> >>>>>> messages in rangeserver's log :
>
> >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN]
> >>>>>> (Lib/
> >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because
> >>>>>> 1246607682171649001 >= 1246607682128108001 (file='/hypertable/
> >>>>>> servers/
> >>>>>> 221.194.134.173_31060/log/root/0')
> >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN]
> >>>>>> (Lib/
> >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because
> >>>>>> 1248187695757932563 >= 1247819802453791364 (file='/hypertable/
> >>>>>> servers/
> >>>>>> 221.194.134.173_31060/log/metadata/2')
> >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN]
> >>>>>> (Lib/
> >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because
> >>>>>> 1248193806824860161 >= 1248189458336849002 (file='/hypertable/
> >>>>>> servers/
> >>>>>> 221.194.134.173_31060/log/user/401')
> >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [INFO]
> >>>>>> (RangeServer/MaintenancePrioritizerLogCleanup.cc:103) Adding
> >>>>>> maintenance for range METADATA[0: .. ] because mid-split(1)
> >>>>>> 2009-07-22 08:23:41,449 1295067456 Hypertable.RangeServer [INFO]
> >>>>>> (RangeServer/RangeServer.cc:2032) Memory Usage: 312320288 bytes
> >>>>>> 2009-07-22 08:23:41,449 1378986304 Hypertable.RangeServer [INFO]
> >>>>>> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of
> >>>>>> METADATA
> >>>>>> [0: .. ](default)
> >>>>>> 2009-07-22 08:23:41,529 1378986304 Hypertable.RangeServer [INFO]
> >>>>>> (RangeServer/AccessGroup.cc:533) Finished Compaction of METADATA
> >>>>>> [0: .. ](default)
> >>>>>> 2009-07-22 08:23:41,530 1378986304 Hypertable.RangeServer [INFO]
> >>>>>> (RangeServer/AccessGroup.cc:372) Starting InMemory Compaction of
> >>>>>> METADATA[0: .. ](location)
> >>>>>> 2009-07-22 08:23:41,549 1378986304 Hypertable.RangeServer [INFO]
> >>>>>> (RangeServer/AccessGroup.cc:533) Finished Compaction of METADATA
> >>>>>> [0: .. ](location)
> >>>>>> 2009-07-22 08:23:41,549 1378986304 Hypertable.RangeServer [INFO]
> >>>>>> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of
> >>>>>> METADATA
> >>>>>> [0: .. ](logging)
> >>>>>> 2009-07-22 08:23:41,552 1378986304 Hypertable.RangeServer [FATAL]
> >>>>>> (Common/BloomFilter.h:47) failed expectation: m_num_bits != 0
>
> >>>>>> It seems that RangeServer cannot restore from log-replaying.
>
> >>>>>> What's the problem? How to fix it ?
>
> >>>>>> Thanks
>
> >>>>>>   -- kuer
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to