I've been building a new cluster with cephadm, the OS is Ubuntu 24.04 and
I'm using the ubuntu provided host packages, docker is 29.1.2 and
containerd is 2.2.0 and the ceph release is squid 19.2.3.

Everything seems to work just perfectly, except for scrape-health-metrics,
which records this result (for all of the osds in the 4 hosts in the
cluster):
  "20251212-074916": {
        "dev": "/dev/nvme5n1",
        "error": "smartctl failed",
        "nvme_smart_health_information_add_log_error": "nvme returned an
error: sudo: exit status: 1",
        "nvme_smart_health_information_add_log_error_code": -22,
        "nvme_vendor": "samsung",
        "smartctl_error_code": -22,
        "smartctl_output": "smartctl returned an error (1): stderr:\nsudo:
exit status: 1\nstdout:\n"
    },

Digging though the source I found that it's the OSD container that runs the
command: sudo /usr/sbin/smartctl -x --json=o /dev/nvme5n1

I can run the command in the container using:
   docker exec -it -u ceph ceph-eac6da30-d72a-11f0-88be-7cc255639332-osd-3
sudo /usr/sbin/smartctl -x --json=o /dev/nvme5n1

The result of the manual command is:
sudo: PAM account management error: Authentication service cannot retrieve
authentication info
sudo: a password is required

I have enabled debugging for sudo, in the container and verified that the
command that the osd is running, is same one and that the error is the same
as well, running ceph device scrape-health-metrics causes this sudo debug
output:

Dec 12 10:30:35 sudo[845] user command "/usr/sbin/smartctl -x --json=o
/dev/nvme5n1" matches sudoers command "/usr/sbin/smartctl -x --json=o
/dev/*": true @ command_matches() ./match_command.c:667
Dec 12 10:30:35 sudo[845] userspec matched @
/etc/sudoers.d/ceph-smartctl:4:57: allowed @ sudoers_lookup_check()
./parse.c:167
Dec 12 10:30:35 sudo[845] sudo_putenv: SUDO_COMMAND=/usr/sbin/smartctl -x
--json=o /dev/nvme5n1
Dec 12 10:30:35 sudo[845] <- new_logline @ ./eventlog.c:218 := PAM account
management error: Authentication service cannot retrieve authentication
info ; TTY=pts/8 ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -x
--json=o /dev/nvme5n1


My only guess is that I have chosen a too-new Ubuntu version for Ceph and I
should just bite the bullet and re-install with Ubuntu 22.04, but if anyone
has a better idea, please let me know.

-- 
Flemming Frandsen - YAPH - http://osaa.dk - http://dren.dk/
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to