2014-06-18 22:44 GMT+08:00 Gregory Farnum <[email protected]>: > On Tue, Jun 17, 2014 at 9:46 PM, Ke-fei Lin <[email protected]> wrote: >> 2014-06-18 1:28 GMT+08:00 Gregory Farnum <[email protected]>: >>> On Tue, Jun 17, 2014 at 3:22 AM, Ke-fei Lin <[email protected]> wrote: >>>> Hi list, >>>> >>>> How does RADOS check an object and its replica are consistent? Is there >>>> a checksum in object's metadata or some other mechanisms? Does the >>>> mechanism depend on OSD's underlying file system? >>> >>> It does not check consistency on read. On scrub it compares the local >>> FS metadata (size et al) and RADOS metadata (object versions and >>> things); on deep scrub it computes a checksum of each replica and >>> compares them. >> Thank you Greg. >> Let's say if there are an object A and its replica B. On deep scrubbing RADOS >> find that two objects have different checksums. How does RADOS determine >> and repair the corrupted object? > > You have to explicitly trigger a scrub "repair". Right now, whatever > the primary has wins; that's obviously suboptimal. (So generally you > should try and get manually involved with repairs.)
If I choose XFS as the underlying file system, according to my understanding, the corrupted object will be detected if and only if a deep scrub happened. Then it's possible that an inconsistent object (on primary) being accidentally readed and without any error, right? So, in such a case, a higher level application logic (or the file system sitting on RBD) should take responsibility for data consistency. Am I worried too much? >>> RADOS does not maintain checksums alongside the objects in replicated pools. >>> >>>> And what would happen if a corrupted object being readed (like a >>>> corrupted block in traditional file system)? >>> >>> If the local filesystem doesn't return an error, it will return the >>> data it was given to the end user. (btrfs maintains its own checksums >> This sounds kind of dangerous. I think corrupted objects will be normal >> instead >> of exception because we usually build up Ceph cluster by commodity hardware. >>> and will return errors, but unfortunately xfs will not.) >> And it seems there are lots of people still using XFS... >> By the way, is this the main reason that Ceph officially suggests btrfs? > > Well, we officially suggest XFS for other reasons, but it is why our > long-term vision is to run on btrfs. > -Greg _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
