Re: [ceph-users] ceph-deploy - problem creating an osd

Alfredo Deza Wed, 11 Jun 2014 07:48:18 -0700

On Wed, Jun 11, 2014 at 9:29 AM, Markus Goldberg
<[email protected]> wrote:
> Hi,
> ceph-deploy-1.5.3 can make trouble, if a reboot is done between preparation
> and aktivation of an osd:
>
> The osd-disk was /dev/sdb at this time, osd itself should go to sdb1,
> formatted to cleared, journal should go to sdb2, formatted to btrfs
> I prepared an osd:
>
> root@bd-a:/etc/ceph# ceph-deploy -v --overwrite-conf osd --fs-type btrfs
> prepare bd-1:/dev/sdb1:/dev/sdb2
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /root/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v
> --overwrite-conf osd --fs-type btrfs prepare bd-1:/dev/sdb1:/dev/sdb2
> [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
> bd-1:/dev/sdb1:/dev/sdb2
> [bd-1][DEBUG ] connected to host: bd-1
> [bd-1][DEBUG ] detect platform information from remote host
> [bd-1][DEBUG ] detect machine type
> [ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
> [ceph_deploy.osd][DEBUG ] Deploying osd to bd-1
> [bd-1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
> [bd-1][INFO  ] Running command: udevadm trigger --subsystem-match=block
> --action=add
> [ceph_deploy.osd][DEBUG ] Preparing host bd-1 disk /dev/sdb1 journal
> /dev/sdb2 activate False
> [bd-1][INFO  ] Running command: ceph-disk-prepare --fs-type btrfs --cluster
> ceph -- /dev/sdb1 /dev/sdb2
> [bd-1][DEBUG ]
> [bd-1][DEBUG ] WARNING! - Btrfs v3.12 IS EXPERIMENTAL
> [bd-1][DEBUG ] WARNING! - see http://btrfs.wiki.kernel.org before using
> [bd-1][DEBUG ]
> [bd-1][DEBUG ] fs created label (null) on /dev/sdb1
> [bd-1][DEBUG ]  nodesize 32768 leafsize 32768 sectorsize 4096 size 19.99TiB
> [bd-1][DEBUG ] Btrfs v3.12
> [bd-1][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if journal is
> not the same device as the osd data
> [bd-1][WARNIN] Turning ON incompat feature 'extref': increased hardlink
> limit per file to 65536
> [bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but we
> have been unable to inform the kernel of the change, probably because
> it/they are in use.  As a result, the old partition(s) will remain in use.
> You should reboot now before making further changes.
> [bd-1][INFO  ] checking OSD status...
> [bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
> [ceph_deploy.osd][DEBUG ] Host bd-1 is now ready for osd use.
> Unhandled exception in thread started by
> sys.excepthook is missing
> lost sys.stderr
>
> ceph-deploy told me to do a reboot, so i did.


This is actually not ceph-deploy asking you for a reboot but the
stderr captured from the
remote node (bd-1 in your case).

ceph-deploy will log output from remote nodes and will preface the
logs with the hostname when
the output happens remotely. stderr will be used as WARNING level and
stdout as DEBUG.

So in your case this line is output from ceph-disk-prepare/btrfs:

> [bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but we
> have been unable to inform the kernel of the change, probably because
> it/they are in use.  As a result, the old partition(s) will remain in use.
> You should reboot now before making further changes.

Have you tried 'create' instead of 'prepare' and 'activate' ?

> After the reboot the osd-disk changed from sdb to sda. This is a known
> problem of linux (ubuntu)
>
> root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sda1:/dev/sda2
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /root/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osd
> activate bd-1:/dev/sda1:/dev/sda2
> [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
> bd-1:/dev/sda1:/dev/sda2
> [bd-1][DEBUG ] connected to host: bd-1
> [bd-1][DEBUG ] detect platform information from remote host
> [bd-1][DEBUG ] detect machine type
> [ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
> [ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sda1
> [ceph_deploy.osd][DEBUG ] will use init type: upstart
> [bd-1][INFO  ] Running command: ceph-disk-activate --mark-init upstart
> --mount /dev/sda1
> [bd-1][WARNIN] got monmap epoch 1
> [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
> [bd-1][WARNIN] 2014-06-10 11:45:07.222697 7f5c111af800 -1 journal check:
> ondisk fsid c8ce6ee2-f21b-4ba3-a20e-649224244b9a doesn't match expected
> fcaaf66f-b7b7-4702-83a4-54832b7131fa, invalid (someone else's?) journal
> [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
> [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
> [bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
> [bd-1][WARNIN] 2014-06-10 11:45:08.125384 7f5c111af800 -1
> filestore(/var/lib/ceph/tmp/mnt.LryOxo) could not find
> 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
> [bd-1][WARNIN] 2014-06-10 11:45:08.320327 7f5c111af800 -1 created object
> store /var/lib/ceph/tmp/mnt.LryOxo journal
> /var/lib/ceph/tmp/mnt.LryOxo/journal for osd.4 fsid
> 08066b4a-3f36-4e3f-bd1e-15c006a09057
> [bd-1][WARNIN] 2014-06-10 11:45:08.320367 7f5c111af800 -1 auth: error
> reading file: /var/lib/ceph/tmp/mnt.LryOxo/keyring: can't open
> /var/lib/ceph/tmp/mnt.LryOxo/keyring: (2) No such file or directory
> [bd-1][WARNIN] 2014-06-10 11:45:08.320419 7f5c111af800 -1 created new key in
> keyring /var/lib/ceph/tmp/mnt.LryOxo/keyring
> [bd-1][WARNIN] added key for osd.4
> [bd-1][INFO  ] checking OSD status...
> [bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
> [bd-1][WARNIN] there are 2 OSDs down
> [bd-1][WARNIN] there are 2 OSDs out
> root@bd-a:/etc/ceph# ceph -s
>     cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057
>      health HEALTH_WARN 679 pgs degraded; 992 pgs stuck unclean; recovery
> 19/60 objects degraded (31.667%); clock skew detected on mon.bd-1
>      monmap e1: 3 mons at
> {bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},
> election epoch 4034, quorum 0,1,2 bd-0,bd-1,bd-2
>      mdsmap e2815: 1/1/1 up {0=bd-2=up:active}, 2 up:standby
>      osdmap e1717: 6 osds: 4 up, 4 in
>       pgmap v46008: 992 pgs, 11 pools, 544 kB data, 20 objects
>             10324 MB used, 125 TB / 125 TB avail
>             19/60 objects degraded (31.667%)
>                    2 active
>                  679 active+degraded
>                  311 active+remapped
> root@bd-a:/etc/ceph# ceph osd tree
> # id    weight  type name       up/down reweight
> -1      189.1   root default
> -2      63.63           host bd-0
> 0       43.64                   osd.0   up      1
> 3       19.99                   osd.3   up      1
> -3      63.63           host bd-1
> 1       43.64                   osd.1   down    0
> 4       19.99                   osd.4   down    0
> -4      61.81           host bd-2
> 2       43.64                   osd.2   up      1
> 5       18.17                   osd.5   up      1
>
> At this time i rebooted bd-1 once more and the osd-disk now was /dev/sdb.
> So i tried once more to activate the osd:
>
>
> root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sdb1:/dev/sdb2
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /root/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osd
> activate bd-1:/dev/sdb1:/dev/sdb2
> [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
> bd-1:/dev/sdb1:/dev/sdb2
> [bd-1][DEBUG ] connected to host: bd-1
> [bd-1][DEBUG ] detect platform information from remote host
> [bd-1][DEBUG ] detect machine type
> [ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
> [ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sdb1
> [ceph_deploy.osd][DEBUG ] will use init type: upstart
> [bd-1][INFO  ] Running command: ceph-disk-activate --mark-init upstart
> --mount /dev/sdb1
> [bd-1][INFO  ] checking OSD status...
> [bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
> [bd-1][WARNIN] there are 2 OSDs down
> [bd-1][WARNIN] there are 2 OSDs out
> root@bd-a:/etc/ceph# ceph osd tree
> # id    weight  type name       up/down reweight
> -1      189.1   root default
> -2      63.63           host bd-0
> 0       43.64                   osd.0   up      1
> 3       19.99                   osd.3   up      1
> -3      63.63           host bd-1
> 1       43.64                   osd.1   down    0
> 4       19.99                   osd.4   down    0
> -4      61.81           host bd-2
> 2       43.64                   osd.2   up      1
> 5       18.17                   osd.5   up      1
> root@bd-a:/etc/ceph# ceph -s
>     cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057
>      health HEALTH_WARN 679 pgs degraded; 992 pgs stuck unclean; recovery
> 10/60 objects degraded (16.667%); clock skew detected on mon.bd-1
>      monmap e1: 3 mons at
> {bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},
> election epoch 4060, quorum 0,1,2 bd-0,bd-1,bd-2
>      mdsmap e2823: 1/1/1 up {0=bd-2=up:active}, 2 up:standby
>      osdmap e1759: 6 osds: 4 up, 4 in
>       pgmap v46110: 992 pgs, 11 pools, 544 kB data, 20 objects
>             10320 MB used, 125 TB / 125 TB avail
>             10/60 objects degraded (16.667%)
>                  679 active+degraded
>                  313 active+remapped
> root@bd-a:/etc/ceph#
>
> After another reboot all was ok:
>
> ceph -s
>     cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057
>      health HEALTH_OK
>      monmap e1: 3 mons at
> {bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},
> election epoch 4220, quorum 0,1,2 bd-0,bd-1,bd-2
>      mdsmap e2895: 1/1/1 up {0=bd-2=up:active}, 2 up:standby
>      osdmap e1939: 6 osds: 6 up, 6 in
>       pgmap v47099: 992 pgs, 11 pools, 551 kB data, 20 objects
>             117 MB used, 189 TB / 189 TB avail
>                  992 active+clean
> root@bd-a:~#
>
>
> Is it possible for the author of ceph-deploy, to make the reboot needlessly
> during these 2 steps ?
> Then it would also be possible to use create instead of prepare+activate
>
> Thank you,
>   Markus
>
> --
> MfG,
>   Markus Goldberg
>
> --------------------------------------------------------------------------
> Markus Goldberg       Universität Hildesheim
>                       Rechenzentrum
> Tel +49 5121 88392822 Marienburger Platz 22, D-31141 Hildesheim, Germany
> Fax +49 5121 88392823 email [email protected]
> --------------------------------------------------------------------------
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy - problem creating an osd

Reply via email to