[ceph-users] Re: osds won't start

Mazzystr Fri, 11 Feb 2022 09:09:50 -0800

My clusters are self rolled.  My start command is as follows

podman run -it --privileged --pid=host --cpuset-cpus 0,1 --memory 2g --name
ceph_osd0 --hostname ceph_osd0 -v /dev:/dev -v
/etc/localtime:/etc/localtime:ro -v /etc/ceph:/etc/ceph/ -v
/var/lib/ceph/osd/ceph-0:/var/lib/ceph/osd/ceph-0 -v
/var/log/ceph:/var/log/ceph -v /run/udev/:/run/udev/
ceph/ceph:v16.2.7-20220201 ceph-osd --id 0 -c /etc/ceph/ceph.conf --cluster
ceph -f



I jumped from the octopus img to the 16.2.7 img.  I've been running well
for awhile with no issues.  The cluster was clean, no backfills in
progressor etc  This latest zyp up and reboot and now I have osds that
don't start.

podman image ls
quay.io/ceph/ceph                                         v16.2.7
231fd40524c4  9 days ago    1.39 GB
quay.io/ceph/ceph                                         v16.2.7-20220201
 231fd40524c4  9 days ago    1.39 GB


bluefs fails to mount up, I guess?  The headers are still readable via
bluestore tool

ceph-bluestore-tool show-label --dev /dev/mapper/ceph-0block
{
    "/dev/mapper/ceph-0block": {
        "osd_uuid": "1234abcd-1234-abcd-1234-1234 abcd1234",
        "size": 6001171365888,
        "btime": "2019-04-11T08:46:36.013428-0700",
        "description": "main",
        "bfm_blocks": "1465129728",
        "bfm_blocks_per_key": "128",
        "bfm_bytes_per_block": "4096",
        "bfm_size": "6001171365888",
        "bluefs": "1",
        "ceph_fsid": "1234abcd-1234-abcd-1234-1234 abcd1234",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "require_osd_release": "16",
        "whoami": "0"
    }
}


On Fri, Feb 11, 2022 at 1:06 AM Eugen Block <[email protected]> wrote:

> Can you share some more information how exactly you upgraded? It looks
> like a cephadm managed cluster. Did you intall OS updates on all nodes
> without waiting for the first one to recover? Maybe I'm misreading so
> please clarify what your update process looked like.
>
>
> Zitat von Mazzystr <[email protected]>:
>
> > I applied latest os updates and rebooted my hosts.  Now all my osds fail
> to
> > start.
> >
> > # cat /etc/os-release
> > NAME="openSUSE Tumbleweed"
> > # VERSION="20220207"
> > ID="opensuse-tumbleweed"
> > ID_LIKE="opensuse suse"
> > VERSION_ID="20220207"
> >
> > # uname -a
> > Linux cube 5.16.5-1-default #1 SMP PREEMPT Thu Feb 3 05:26:48 UTC 2022
> > (1af4009) x86_64 x86_64 x86_64 GNU/Linux
> >
> > container image: v16.2.7 / v16.2.7-20220201
> >
> > osd debug log shows the following
> >   -11> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1 bluefs
> add_block_device
> > bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 50 GiB
> >    -10> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
> > max_total_wal_size = 1073741824
> >     -9> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
> > compaction_readahead_size = 2097152
> >     -8> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
> > max_write_buffer_number = 4
> >     -7> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
> > max_background_compactions = 2
> >     -6> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
> > compression = kNoCompression
> >     -5> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
> > writable_file_max_buffer_size = 0
> >     -4> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
> > min_write_buffer_number_to_merge = 1
> >     -3> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
> > recycle_log_file_num = 4
> >     -2> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
> > write_buffer_size = 268435456
> >     -1> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1 bluefs mount
> >      0> 2022-02-10T19:14:48.387-0800 7ff1be4c3080 -1 *** Caught signal
> > (Aborted) **
> >  in thread 7ff1be4c3080 thread_name:ceph-osd
> >
> >  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
> > (stable)
> >  1: /lib64/libpthread.so.0(+0x12c20) [0x7ff1bc465c20]
> >  2: gsignal()
> >  3: abort()
> >  4: /lib64/libstdc++.so.6(+0x9009b) [0x7ff1bba7c09b]
> >  5: /lib64/libstdc++.so.6(+0x9653c) [0x7ff1bba8253c]
> >  6: /lib64/libstdc++.so.6(+0x96597) [0x7ff1bba82597]
> >  7: /lib64/libstdc++.so.6(+0x967f8) [0x7ff1bba827f8]
> >  8: ceph-osd(+0x56301f) [0x559ff6d6301f]
> >  9: (BlueFS::_open_super()+0x18c) [0x559ff745f08c]
> >  10: (BlueFS::mount()+0xeb) [0x559ff748085b]
> >  11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x559ff735e464]
> >  12: (BlueStore::_prepare_db_environment(bool, bool,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >*, std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> >*)+0x6d9) [0x559ff735f5b9]
> >  13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x559ff73608b5]
> >  14: (BlueStore::_open_db_and_around(bool, bool)+0x273) [0x559ff73cba33]
> >  15: (BlueStore::_mount()+0x204) [0x559ff73ce974]
> >  16: (OSD::init()+0x380) [0x559ff6ea2400]
> >  17: main()
> >  18: __libc_start_main()
> >  19: _start()
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > to interpret this.
> >
> >
> > The process log shows the following
> > 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
> > dangerous and experimental features are enabled: bluestore,rocksdb
> > 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
> > dangerous and experimental features are enabled: bluestore,rocksdb
> > 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
> > dangerous and experimental features are enabled: bluestore,rocksdb
> > terminate called after throwing an instance of
> > 'ceph::buffer::v15_2_0::malformed_input'
> >   what():  void
> > bluefs_super_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) no
> > longer understand old encoding version 2 < 143: Malformed input
> > *** Caught signal (Aborted) **
> >  in thread 7f22869e8080 thread_name:ceph-osd
> >  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
> > (stable)
> >  1: /lib64/libpthread.so.0(+0x12c20) [0x7f228498ac20]
> >  2: gsignal()
> >  3: abort()
> >  4: /lib64/libstdc++.so.6(+0x9009b) [0x7f2283fa109b]
> >  5: /lib64/libstdc++.so.6(+0x9653c) [0x7f2283fa753c]
> >  6: /lib64/libstdc++.so.6(+0x96597) [0x7f2283fa7597]
> >  7: /lib64/libstdc++.so.6(+0x967f8) [0x7f2283fa77f8]
> >  8: ceph-osd(+0x56301f) [0x55e6faf6301f]
> >  9: (BlueFS::_open_super()+0x18c) [0x55e6fb65f08c]
> >  10: (BlueFS::mount()+0xeb) [0x55e6fb68085b]
> >  11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x55e6fb55e464]
> >  12: (BlueStore::_prepare_db_environment(bool, bool,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >*, std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> >*)+0x6d9) [0x55e6fb55f5b9]
> >  13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x55e6fb5608b5]
> >  14: (BlueStore::_open_db_and_around(bool, bool)+0x273) [0x55e6fb5cba33]
> >  15: (BlueStore::_mount()+0x204) [0x55e6fb5ce974]
> >  16: (OSD::init()+0x380) [0x55e6fb0a2400]
> >  17: main()
> >  18: __libc_start_main()
> >  19: _start()
> > 2022-02-10T19:33:34.620-0800 7f22869e8080 -1 *** Caught signal (Aborted)
> **
> >  in thread 7f22869e8080 thread_name:ceph-osd
> >
> >
> > Doesn't anyone have any ideas what could be going on here?
> >
> > Thanks,
> > /Chris
> > _______________________________________________
> > ceph-users mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
>
>
>
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: osds won't start

Reply via email to