On 12/30/2014 01:40 PM, Vijayendra Shamanna wrote:
> Hi,
>
> There is a sync thread (sync_entry in FileStore.cc) which triggers
> periodically and executes sync_filesystem() to ensure that the data is
> consistent. The journal entries are trimmed only after a successful
> sync_filesystem() call
sync_filesystem() always returns zero and journal will be trimmed. Executing
sync()/syncfs() with dirty data in disk buffers will result in data loss ("lost
page write due to I/O error").
I was doing some experiments simulating disk errors using Device Mapper "error"
target. In this setup OSD was writing to broken disk without crashing. Every 5
seconds (filestore_max_sync_interval) kernel logs that some data were discarded
due to IO error.
> Thanks
> Viju
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Pawel Sadowski
>> Sent: Tuesday, December 30, 2014 1:52 PM
>> To: [email protected]
>> Subject: Ceph data consistency
>>
>> Hi,
>>
>> On our Ceph cluster from time to time we have some inconsistent PGs (after
>> deep-scrub). We have some issues with disk/sata cables/lsi controller
>> causing IO errors from time to time (but that's not the point in this case).
>>
>> When IO error occurs on OSD journal partition everything works as is should
>> -> OSD is crashed and that's ok - Ceph will handle that.
>>
>> But when IO error occurs on OSD data partition during journal flush OSD
>> continue to work. After calling *writev* (in buffer::list::write_fd) OSD
>> does check return code from this call but does NOT verify if write has been
>> successful to disk (data are still only >in memory and there is no fsync).
>> That way OSD thinks that data has been stored on disk but it might be
>> discarded (during sync dirty page will be reclaimed and you'll see "lost
>> page write due to I/O error" in dmesg).
>>
>> Since there is no checksumming of data I just wanted to make sure that this
>> is by design. Maybe there is a way to tell OSD to call fsync after write and
>> have data consistent?
--
PS
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html