> On Aug 21, 2025, at 4:07 AM, Miles Goodhew <c...@m0les.com> wrote:
> 
> Hi Robert,
>  I'm not an expert on the low-level details and "modern" Ceph, so I hope I 
> don't lead you on any wild goose chases, but I might at least give some leads.
>  It seems odd that the metrics mention NVM/e - I'm guessing that it's just a 
> cross-product test and tries all tools on all devices.

Recent releases of smartctl pass through stats for NVMe devices via the 
name-cli command "nvme".  Whether it invokes that for all devices, ordering, 
etc I don't know.


> SMART test failure is more of an issue. It's a pity the error message is so 
> nondescript. Some things I can think of from simplest to most complicated are:
> * Are smartmontools installed on the drive host?

Does it happen with other drives on the same host?

If you have availability through your chassis vendor, look for a firmware 
update.

> * Does the monitoring UID have sudo access?
> * Does a manual "sudo smartctl -a /dev/sdc" give the same or similar result?
> * Is the drive managed by a hardware RAID controller or concentrator (Like 
> Dell PERC or a USB adapter or something)
> * (This is a stretch) Is there an OSD for the drive that's given the "NVME" 
> class?
> 
> Hope that gives you something.
> 
> M0les.
> 
> 
> On Thu, 21 Aug 2025, at 17:15, Robert Sander wrote:
>> Hi,
>> 
>> On a new cluster with version 19.2.3 the device health metrics only show a 
>> smartctl error:
>> 
>> {
>>     "20250821-000313": {
>>         "dev": "/dev/sdc",
>>         "error": "smartctl failed",
>>         "nvme_smart_health_information_add_log_error": "nvme returned an 
>> error: sudo: exit status: 1",
>>         "nvme_smart_health_information_add_log_error_code": -22,
>>         "nvme_vendor": "ata",
>>         "smartctl_error_code": -22,
>>         "smartctl_output": "smartctl returned an error (1): stderr:\nsudo: 
>> exit status: 1\nstdout:\n"
>>     }
>> }
>> 
>> The device in question (like all the other in the cluster) is a Samsung 
>> MZ7L37T6 SATA SSD.
>> 
>> What is happening here?
>> 
>> Regards
>> -- 
>> Robert Sander
>> Linux Consultant
>> 
>> Heinlein Consulting GmbH
>> Schwedter Str. 8/9b, 10119 Berlin
>> 
>> https://www.heinlein-support.de
>> 
>> Tel: +49 30 405051 - 0
>> Fax: +49 30 405051 - 19
>> 
>> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
>> Geschäftsführer: Peer Heinlein - Sitz: Berlin
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to