2014-06-18 22:44 GMT+08:00 Gregory Farnum <[email protected]>:
> On Tue, Jun 17, 2014 at 9:46 PM, Ke-fei Lin <[email protected]> wrote:
>> 2014-06-18 1:28 GMT+08:00 Gregory Farnum <[email protected]>:
>>> On Tue, Jun 17, 2014 at 3:22 AM, Ke-fei Lin <[email protected]> wrote:
>>>> Hi list,
>>>>
>>>> How does RADOS check an object and its replica are consistent? Is there
>>>> a checksum in object's metadata or some other mechanisms? Does the
>>>> mechanism depend on OSD's underlying file system?
>>>
>>> It does not check consistency on read. On scrub it compares the local
>>> FS metadata (size et al) and RADOS metadata (object versions and
>>> things); on deep scrub it computes a checksum of each replica and
>>> compares them.
>> Thank you Greg.
>> Let's say if there are an object A and its replica B. On deep scrubbing RADOS
>> find that two objects have different checksums. How does RADOS determine
>> and repair the corrupted object?
>
> You have to explicitly trigger a scrub "repair". Right now, whatever
> the primary has wins; that's obviously suboptimal. (So generally you
> should try and get manually involved with repairs.)

If I choose XFS as the underlying file system, according to my understanding,
the corrupted object will be detected if and only if a deep scrub
happened. Then it's
possible that an inconsistent object (on primary) being accidentally readed and
without any error, right?

So, in such a case, a higher level application logic (or the file
system sitting on
RBD) should take responsibility for data consistency. Am I worried too much?

>>> RADOS does not maintain checksums alongside the objects in replicated pools.
>>>
>>>> And what would happen if a corrupted object being readed (like a
>>>> corrupted block in traditional file system)?
>>>
>>> If the local filesystem doesn't return an error, it will return the
>>> data it was given to the end user. (btrfs maintains its own checksums
>> This sounds kind of dangerous. I think corrupted objects will be normal 
>> instead
>> of exception because we usually build up Ceph cluster by commodity hardware.
>>> and will return errors, but unfortunately xfs will not.)
>> And it seems there are lots of people still using XFS...
>> By the way, is this the main reason that Ceph officially suggests btrfs?
>
> Well, we officially suggest XFS for other reasons, but it is why our
> long-term vision is to run on btrfs.
> -Greg
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to