Hi Kuer, Can you grep through your logs (on all machines) for "ERROR" and "Exception" ? Post the result of the grep output and we'll take a look.
- Doug On Wed, Jul 22, 2009 at 2:05 AM, kuer <[email protected]> wrote: > > Hi, all, > > the content of the file that cause assertion failure of BloomFilter : > > /hypertable/tables/METADATA/logging/AB2A0D28DE6B77FFDD6C72AF/cs0 > > $ hexdump -C cs0 > 00000000 49 64 78 46 69 78 2d 2d 2d 2d 1a 00 ff ff ff ff | > IdxFix----......| > 00000010 00 00 00 00 00 00 00 00 7d 9f 49 64 78 56 61 72 > |........}.IdxVar| > 00000020 2d 2d 2d 2d 1a 00 ff ff ff ff 00 00 00 00 00 00 > |----............| > 00000030 00 00 87 97 |....| > 00000034 > > FYI > > -- kuer > > > On 7月22日, 下午1时03分, Sanjit Jhala <[email protected]> wrote: > > Recovering ranges from crashed RangeServers is one of the high > > priority items Doug is working on. > > > > -Sanjit > > > > On Jul 21, 2009, at 7:59 PM, kuer wrote: > > > > > > > > > Hi, all, > > > > > Another question, as one of range-servers will coredump when > > > replaying commit log, so I just stop rebooting it. But this time, the > > > whole HT system seems stop working, too. > > > > > Client program complain socket.timeout, > > > > > hyperspace shell hangs : > > > hypertable> show tables; > > > METADATA > > > kvcache > > > storage_se > > > > > Elapsed time: 0.00 s > > > hypertable> show create table storage_se; > > > ^^^^^ waiting for .... ???? > > > > > Logging messages from Hypertable.Master : > > > > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR] > > > (AsyncComm/Comm.cc:212) No connection for 221.194.134.173:31060 > > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [WARN] (Lib/ > > > RangeServerClient.cc:312) Comm::send_request to 221.194.134.173:31060 > > > failed - COMM not connected > > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR] > > > find_range_and_start_scan (Lib/IntervalScanner.cc:408): > > > Hypertable::Exception: Comm::send_request to 221.194.134.173:31060 > > > failed - COMM not connected > > > at void Hypertable::RangeServerClient::send_message(const > > > sockaddr_in&, Hypertable::CommBufPtr&, Hypertable::DispatchHandler*) > > > (Lib/RangeServerClient.cc:314) > > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR] (Master/ > > > MasterGc.cc:239) Error: caught exception while gc'ing: Problem > > > creating scanner on METADATA[..0: ] > > > > > NOTES: 221.194.134.173 is IP of the box where RangeServer went wrong. > > > > > My question is : > > > since all information are shared by all rangeserver, why not > > > hypertable.master reassign the ranges to other rangeserver when some > > > of rangeservers go out of work ??? > > > > > thanks > > > > > -- kuer > > > > > On 7月22日, 上午10时43分, kuer <[email protected]> wrote: > > >> Hi, Sanjit, > > > > >> I just upload the second part of range.log range.20090722.log. > > >> 2.gz。 > > > > >> the first part of range.20090722.log.1.gz is about 18MB, it exceed > > >> the > > >> limits of upload files. > > > > >>http://hypertable-dev.googlegroups.com/web/range.20090722.log.2.gz? > > >> gd... > > > > >> IF it is necessary, I will split the first log file and upload them. > > > > >> Thanks > > > > >> -- kuer > > > > >> On 7月22日, 上午10时15分, Sanjit Jhala <[email protected]> > > >> wrote: > > > > >>> Hi Kuer, > > > > >>> You can gzip the RangeServer log and post them to the File Upload > > >>> Page. Thanks for reporting this issue. > > > > >>> -Sanjit > > > > >>> On Jul 21, 2009, at 6:44 PM, kuer wrote: > > > > >>>> Hi, Sanjit, > > > > >>>> with --debug option, I get some logging message, but the file is > > >>>> big, > > >>>> how to share it with you? > > > > >>>> gdb backtrace of core files > > > > >>>> (gdb) bt > > >>>> #0 0x0000000000538272 in > > >>>> Hypertable > > >>>> ::BasicBloomFilter<Hypertable::MurmurHash2>::BasicBloomFilter > > >>>> () > > >>>> #1 0x000000000053d3be in > > >>>> Hypertable::CellStoreV1::create_bloom_filter > > >>>> () > > >>>> #2 0x000000000053e10e in Hypertable::CellStoreV1::finalize () > > >>>> #3 0x000000000051f112 in Hypertable::AccessGroup::run_compaction > > >>>> () > > >>>> #4 0x0000000000504e45 in > > >>>> Hypertable::Range::split_compact_and_shrink > > >>>> () > > >>>> #5 0x0000000000509310 in Hypertable::Range::split () > > >>>> #6 0x00000000004ec693 in > > >>>> Hypertable::MaintenanceQueue::Worker::operator() () > > >>>> #7 0x00000000006a5c40 in thread_proxy () > > >>>> #8 0x00000038ae406367 in start_thread () from /lib64/ > > >>>> libpthread.so.0 > > >>>> #9 0x00000038ad8d2f7d in clone () from /lib64/libc.so.6 > > > > >>>> -- kuer > > > > >>>> On 7月22日, 上午9时07分, Sanjit Jhala <[email protected]> > > >>>> wrote: > > >>>>> Hi Kuer, > > > > >>>>> This looks like a bug in the RangeServer code. The RangeServer is > > >>>>> trying to create a CellStore file and while creating the > > >>>>> CellStore's > > >>>>> BloomFilter its hitting an error condition. > > > > >>>>> Can you try a couple of things to help debug this issue? > > > > >>>>> Firstly turn on the RangeServer debug logging and report > > >>>>> RangeServer > > >>>>> logs. You can do this by adding the global option --debug to your > > >>>>> start-all-servers.sh command line. Example: < > > >>>>> $HYPERTABLE_INSTALL_DIR>/ > > >>>>> bin/start-all-servers.sh kfs --debug > > > > >>>>> Secondly, if you could compile a debug build and send the stack > > >>>>> trace > > >>>>> that would be helpful. To do this, from your hypertable build > > >>>>> directory run > > >>>>> ccmake <$HYPERTABLE_SRC_DIR> and make sure CMAKE_BUILD_TYPE is > > >>>>> set > > >>>>> to > > >>>>> Debug and install the new build. After you try to bring up the > > >>>>> RangeServer and it dumps core, you can load the core file in gdb > > >>>>> (Eg: > > >>>>> gdb gdb <$HYPERTABLE_INSTALL_DIR>/bin/Hypertable.RangeServer < > > >>>>> $CORE_FILE>). You can run bt (backtrace) in gdb to get the stack > > >>>>> trace. > > > > >>>>> -Sanjit > > > > >>>>> On Jul 21, 2009, at 5:36 PM, kuer wrote: > > > > >>>>>> Hi, all, > > > > >>>>>> one of RangeServers hangs after coredump and restarting . here > > >>>>>> are > > >>>>>> messages in rangeserver's log : > > > > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN] > > >>>>>> (Lib/ > > >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because > > >>>>>> 1246607682171649001 >= 1246607682128108001 (file='/hypertable/ > > >>>>>> servers/ > > >>>>>> 221.194.134.173_31060/log/root/0') > > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN] > > >>>>>> (Lib/ > > >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because > > >>>>>> 1248187695757932563 >= 1247819802453791364 (file='/hypertable/ > > >>>>>> servers/ > > >>>>>> 221.194.134.173_31060/log/metadata/2') > > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN] > > >>>>>> (Lib/ > > >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because > > >>>>>> 1248193806824860161 >= 1248189458336849002 (file='/hypertable/ > > >>>>>> servers/ > > >>>>>> 221.194.134.173_31060/log/user/401') > > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [INFO] > > >>>>>> (RangeServer/MaintenancePrioritizerLogCleanup.cc:103) Adding > > >>>>>> maintenance for range METADATA[0: .. ] because mid-split(1) > > >>>>>> 2009-07-22 08:23:41,449 1295067456 Hypertable.RangeServer [INFO] > > >>>>>> (RangeServer/RangeServer.cc:2032) Memory Usage: 312320288 bytes > > >>>>>> 2009-07-22 08:23:41,449 1378986304 Hypertable.RangeServer [INFO] > > >>>>>> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of > > >>>>>> METADATA > > >>>>>> [0: .. ](default) > > >>>>>> 2009-07-22 08:23:41,529 1378986304 Hypertable.RangeServer [INFO] > > >>>>>> (RangeServer/AccessGroup.cc:533) Finished Compaction of METADATA > > >>>>>> [0: .. ](default) > > >>>>>> 2009-07-22 08:23:41,530 1378986304 Hypertable.RangeServer [INFO] > > >>>>>> (RangeServer/AccessGroup.cc:372) Starting InMemory Compaction of > > >>>>>> METADATA[0: .. ](location) > > >>>>>> 2009-07-22 08:23:41,549 1378986304 Hypertable.RangeServer [INFO] > > >>>>>> (RangeServer/AccessGroup.cc:533) Finished Compaction of METADATA > > >>>>>> [0: .. ](location) > > >>>>>> 2009-07-22 08:23:41,549 1378986304 Hypertable.RangeServer [INFO] > > >>>>>> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of > > >>>>>> METADATA > > >>>>>> [0: .. ](logging) > > >>>>>> 2009-07-22 08:23:41,552 1378986304 Hypertable.RangeServer [FATAL] > > >>>>>> (Common/BloomFilter.h:47) failed expectation: m_num_bits != 0 > > > > >>>>>> It seems that RangeServer cannot restore from log-replaying. > > > > >>>>>> What's the problem? How to fix it ? > > > > >>>>>> Thanks > > > > >>>>>> -- kuer > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
