Hi, all,
Another question, as one of range-servers will coredump when
replaying commit log, so I just stop rebooting it. But this time, the
whole HT system seems stop working, too.
Client program complain socket.timeout,
hyperspace shell hangs :
hypertable> show tables;
METADATA
kvcache
storage_se
Elapsed time: 0.00 s
hypertable> show create table storage_se;
^^^^^ waiting for .... ????
Logging messages from Hypertable.Master :
2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR]
(AsyncComm/Comm.cc:212) No connection for 221.194.134.173:31060
2009-07-22 10:45:45,276 1350199616 Hypertable.Master [WARN] (Lib/
RangeServerClient.cc:312) Comm::send_request to 221.194.134.173:31060
failed - COMM not connected
2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR]
find_range_and_start_scan (Lib/IntervalScanner.cc:408):
Hypertable::Exception: Comm::send_request to 221.194.134.173:31060
failed - COMM not connected
at void Hypertable::RangeServerClient::send_message(const
sockaddr_in&, Hypertable::CommBufPtr&, Hypertable::DispatchHandler*)
(Lib/RangeServerClient.cc:314)
2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR] (Master/
MasterGc.cc:239) Error: caught exception while gc'ing: Problem
creating scanner on METADATA[..0:��]
NOTES: 221.194.134.173 is IP of the box where RangeServer went wrong.
My question is :
since all information are shared by all rangeserver, why not
hypertable.master reassign the ranges to other rangeserver when some
of rangeservers go out of work ???
thanks
-- kuer
On 7月22日, 上午10时43分, kuer <[email protected]> wrote:
> Hi, Sanjit,
>
> I just upload the second part of range.log range.20090722.log.2.gz。
>
> the first part of range.20090722.log.1.gz is about 18MB, it exceed the
> limits of upload files.
>
> http://hypertable-dev.googlegroups.com/web/range.20090722.log.2.gz?gd...
>
> IF it is necessary, I will split the first log file and upload them.
>
> Thanks
>
> -- kuer
>
> On 7月22日, 上午10时15分, Sanjit Jhala <[email protected]> wrote:
>
> > Hi Kuer,
>
> > You can gzip the RangeServer log and post them to the File Upload
> > Page. Thanks for reporting this issue.
>
> > -Sanjit
>
> > On Jul 21, 2009, at 6:44 PM, kuer wrote:
>
> > > Hi, Sanjit,
>
> > > with --debug option, I get some logging message, but the file is big,
> > > how to share it with you?
>
> > > gdb backtrace of core files
>
> > > (gdb) bt
> > > #0 0x0000000000538272 in
> > > Hypertable
> > > ::BasicBloomFilter<Hypertable::MurmurHash2>::BasicBloomFilter
> > > ()
> > > #1 0x000000000053d3be in Hypertable::CellStoreV1::create_bloom_filter
> > > ()
> > > #2 0x000000000053e10e in Hypertable::CellStoreV1::finalize ()
> > > #3 0x000000000051f112 in Hypertable::AccessGroup::run_compaction ()
> > > #4 0x0000000000504e45 in Hypertable::Range::split_compact_and_shrink
> > > ()
> > > #5 0x0000000000509310 in Hypertable::Range::split ()
> > > #6 0x00000000004ec693 in
> > > Hypertable::MaintenanceQueue::Worker::operator() ()
> > > #7 0x00000000006a5c40 in thread_proxy ()
> > > #8 0x00000038ae406367 in start_thread () from /lib64/libpthread.so.0
> > > #9 0x00000038ad8d2f7d in clone () from /lib64/libc.so.6
>
> > > -- kuer
>
> > > On 7月22日, 上午9时07分, Sanjit Jhala <[email protected]> wrote:
> > >> Hi Kuer,
>
> > >> This looks like a bug in the RangeServer code. The RangeServer is
> > >> trying to create a CellStore file and while creating the CellStore's
> > >> BloomFilter its hitting an error condition.
>
> > >> Can you try a couple of things to help debug this issue?
>
> > >> Firstly turn on the RangeServer debug logging and report RangeServer
> > >> logs. You can do this by adding the global option --debug to your
> > >> start-all-servers.sh command line. Example: <
> > >> $HYPERTABLE_INSTALL_DIR>/
> > >> bin/start-all-servers.sh kfs --debug
>
> > >> Secondly, if you could compile a debug build and send the stack trace
> > >> that would be helpful. To do this, from your hypertable build
> > >> directory run
> > >> ccmake <$HYPERTABLE_SRC_DIR> and make sure CMAKE_BUILD_TYPE is set
> > >> to
> > >> Debug and install the new build. After you try to bring up the
> > >> RangeServer and it dumps core, you can load the core file in gdb (Eg:
> > >> gdb gdb <$HYPERTABLE_INSTALL_DIR>/bin/Hypertable.RangeServer <
> > >> $CORE_FILE>). You can run bt (backtrace) in gdb to get the stack
> > >> trace.
>
> > >> -Sanjit
>
> > >> On Jul 21, 2009, at 5:36 PM, kuer wrote:
>
> > >>> Hi, all,
>
> > >>> one of RangeServers hangs after coredump and restarting . here are
> > >>> messages in rangeserver's log :
>
> > >>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN]
> > >>> (Lib/
> > >>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because
> > >>> 1246607682171649001 >= 1246607682128108001 (file='/hypertable/
> > >>> servers/
> > >>> 221.194.134.173_31060/log/root/0')
> > >>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN]
> > >>> (Lib/
> > >>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because
> > >>> 1248187695757932563 >= 1247819802453791364 (file='/hypertable/
> > >>> servers/
> > >>> 221.194.134.173_31060/log/metadata/2')
> > >>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN]
> > >>> (Lib/
> > >>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because
> > >>> 1248193806824860161 >= 1248189458336849002 (file='/hypertable/
> > >>> servers/
> > >>> 221.194.134.173_31060/log/user/401')
> > >>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [INFO]
> > >>> (RangeServer/MaintenancePrioritizerLogCleanup.cc:103) Adding
> > >>> maintenance for range METADATA[0: .. ] because mid-split(1)
> > >>> 2009-07-22 08:23:41,449 1295067456 Hypertable.RangeServer [INFO]
> > >>> (RangeServer/RangeServer.cc:2032) Memory Usage: 312320288 bytes
> > >>> 2009-07-22 08:23:41,449 1378986304 Hypertable.RangeServer [INFO]
> > >>> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of
> > >>> METADATA
> > >>> [0: .. ](default)
> > >>> 2009-07-22 08:23:41,529 1378986304 Hypertable.RangeServer [INFO]
> > >>> (RangeServer/AccessGroup.cc:533) Finished Compaction of METADATA
> > >>> [0: .. ](default)
> > >>> 2009-07-22 08:23:41,530 1378986304 Hypertable.RangeServer [INFO]
> > >>> (RangeServer/AccessGroup.cc:372) Starting InMemory Compaction of
> > >>> METADATA[0: .. ](location)
> > >>> 2009-07-22 08:23:41,549 1378986304 Hypertable.RangeServer [INFO]
> > >>> (RangeServer/AccessGroup.cc:533) Finished Compaction of METADATA
> > >>> [0: .. ](location)
> > >>> 2009-07-22 08:23:41,549 1378986304 Hypertable.RangeServer [INFO]
> > >>> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of
> > >>> METADATA
> > >>> [0: .. ](logging)
> > >>> 2009-07-22 08:23:41,552 1378986304 Hypertable.RangeServer [FATAL]
> > >>> (Common/BloomFilter.h:47) failed expectation: m_num_bits != 0
>
> > >>> It seems that RangeServer cannot restore from log-replaying.
>
> > >>> What's the problem? How to fix it ?
>
> > >>> Thanks
>
> > >>> -- kuer
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---