On Thu, Jan 19, 2023 at 9:07 PM Lo Re Giuseppe <giuseppe.l...@cscs.ch> wrote:
>
> Dear all,
>
> We have started to use more intensively cephfs for some wlcg related workload.
> We have 3 active mds instances spread on 3 servers, 
> mds_cache_memory_limit=12G, most of the other configs are default ones.
> One of them has crashed this night leaving the log below.
> Do you have any hint on what could be the cause and how to avoid it?

Not atm. Telemetry reported similar crashes

        https://tracker.ceph.com/issues/54959 (cephfs)
        https://tracker.ceph.com/issues/54685 (mgr)

BT indicates tcmalloc involvement, but not sure what's going on.

>
> Regards,
>
> Giuseppe
>
> [root@naret-monitor03 ~]# journalctl -u 
> ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service
> ...
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific >
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   1: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   2: abort()
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   3: /lib64/libstdc++.so.6(+0x987ba) [0x7fe2912567ba]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   4: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   5: /lib64/libstdc++.so.6(+0x95559) [0x7fe291253559]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   6: __gxx_personality_v0()
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   7: /lib64/libgcc_s.so.1(+0x10b03) [0x7fe290c34b03]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   8: _Unwind_Resume()
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   9: /usr/bin/ceph-mds(+0x18c104) [0x5638351e7104]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   10: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   11: gsignal()
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   12: abort()
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   13: /lib64/libstdc++.so.6(+0x9009b) [0x7fe29124e09b]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   14: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   15: /lib64/libstdc++.so.6(+0x96597) [0x7fe291254597]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   16: /lib64/libstdc++.so.6(+0x967f8) [0x7fe2912547f8]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   17: /lib64/libtcmalloc.so.4(+0x19fa4) [0x7fe29bae6fa4]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   18: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, vo>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   19: (std::shared_ptr<inode_t<mempool::mds_co::pool_allocator> > InodeSt>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   20: (CInode::_decode_base(ceph::buffer::v15_2_0::list::iterator_impl<tr>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   21: (CInode::decode_import(ceph::buffer::v15_2_0::list::iterator_impl<t>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   22: (Migrator::decode_import_inode(CDentry*, ceph::buffer::v15_2_0::lis>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   23: (Migrator::decode_import_dir(ceph::buffer::v15_2_0::list::iterator_>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   24: (Migrator::handle_export_dir(boost::intrusive_ptr<MExportDir const>>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   25: (Migrator::dispatch(boost::intrusive_ptr<Message const> const&)+0x1>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   26: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   27: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, boo>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   28: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   29: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x10>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   30: (DispatchQueue::entry()+0x126a) [0x7fe2930a5aba]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   31: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fe2931575d1]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   32: /lib64/libpthread.so.0(+0x81cf) [0x7fe291e451cf]
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   33: clone()
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is neede>
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>  --- begin dump of recent events ---
> Jan 19 04:49:40 naret-monitor03 
> ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
>  terminate called recursively
> Jan 19 04:49:43 naret-monitor03 systemd[1]: 
> ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service:
>  Main process exited, code=exited, status=127/n/a
> Jan 19 04:49:43 naret-monitor03 systemd[1]: 
> ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service:
>  Failed with result 'exit-code'.
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


--
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to