I've seen similar issues in the past with 4U Supermicro servers populated with 
spinning disks. In my case it turned out to be a specific firmware+BIOS 
combination on the disk controller card that was buggy. I fixed it by updating 
the firmware and BIOS on the card to the latest versions.

I saw this on several servers, and it took a while to track down as you can 
imagine. Same symptoms you're reporting.

There was a data corruption problem a while back with the Linux kernel and 
Samsung 850 Pro drives, but your problem doesn't sound like data corruption. 
Still, I'd check to make sure the kernel version you're running has the fix.


________________________________

[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |


________________________________
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.
________________________________



On Thu, 2017-06-01 at 13:40 +0100, Oliver Humpage wrote:


On 1 Jun 2017, at 11:55, Matthew Vernon 
<[email protected]<mailto:[email protected]>> wrote:

You don't say what's in kern.log - we've had (rotating) disks that were 
throwing read errors but still saying they were OK on SMART.



Fair point. There was nothing correlating to the time that ceph logged an error 
this morning, which is why I didn’t mention it, but looking harder I see 
yesterday there was a

May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 FAILED Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 Sense Key : Hardware Error 
[current]
May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 Add. Sense: Internal 
target failure
May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 CDB: Read(10) 28 00 77 51 
42 d8 00 02 00 00
May 31 07:20:13 osd1 kernel: blk_update_request: critical target error, dev 
sdi, sector 2001814232

sdi was the disk with the OSD affected today. Guess it’s flakey SSDs then.

Weird that just re-reading the file makes everything OK though - wondering how 
much it’s worth worrying about that, or if there’s a way of making ceph retry 
reads automatically?

Oliver.

_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to