Re: [ceph-users] Erasure Coded Pools Cause Irretrievable Objects and Possible Corruption

Brian Felton Thu, 05 May 2016 09:51:23 -0700

Logs have been attached to the issue: http://tracker.ceph.com/issues/15745


On Thu, May 5, 2016 at 11:23 AM, Samuel Just <[email protected]> wrote:

> Can you reproduce with
>
> debug ms = 1
> debug objecter = 20
>
> on the radosgw side?
> -Sam
>
> On Thu, May 5, 2016 at 8:28 AM, Brian Felton <[email protected]> wrote:
> > Greetings,
> >
> > We are running a number of Ceph clusters in production to provide object
> > storage services.  We have stumbled upon an issue where objects of
> certain
> > sizes are irretrievable.  The symptoms are very similar to the fix
> > referenced here:
> >
> https://www.redhat.com/archives/rhsa-announce/2015-November/msg00060.html.
> > We can put objects into the cluster via s3/radosgw, but we cannot
> retrieve
> > them (cluster closes the connection without delivering all bytes).
> > Unfortunately, this fix does not apply to us, as we are and have always
> been
> > running Hammer.  We've stumbled on a brand-new edge case.
> >
> > We have produced this issue on the 0.94.3, 0.94.4, and 0.94.6 releases of
> > Hammer.
> >
> > We have produced this issues using three different storage hardware
> > configurations -- 5 instances of clusters running 648 6TB OSDs across
> nine
> > physical nodes, 1 cluster running 30 10GB OSDs across ten VM nodes, and 1
> > cluster running 288 6TB OSDs across four physical nodes.
> >
> > We have determined that this issue only occurs when using erasure coding
> > (we've only tested plugin=jerasure technique=reed_sol_van
> > ruleset-failure-domain=host).
> >
> > Objects of exactly 4.5MiB (4718592 bytes) can be placed into the cluster
> but
> > not retrieved.  At every interval of `rgw object stripe size` thereafter
> (in
> > our case, 4 MiB), the objects are similarly irretrievable.  We have
> tested
> > this from 4.5 to 24.5 MiB, then have spot-checked for much larger values
> to
> > prove the pattern holds.  There is a small range of bytes less than this
> > boundary that are irretrievable.  After much testing, we have found this
> > boundary to be strongly correlated with the k value in our erasure coded
> > pool.  We have observed that the m value in the erasure coding has no
> effect
> > on the window size.  We have tested erasure coded values of k from 2 to
> 9,
> > and we've observed the following ranges:
> >
> > k = 2, m = 1 -> No error
> > k = 3, m = 1 -> 32 bytes (i.e. errors when objects are inclusively
> between
> > 4718561 - 4718592 bytes)
> > k = 3, m = 2 -> 32 bytes
> > k = 4, m = 2 -> No error
> > k = 4, m = 1 -> No error
> > k = 5, m = 4 -> 128 bytes
> > k = 6, m = 3 -> 512 bytes
> > k = 6, m = 2 -> 512 bytes
> > k = 7, m = 1 -> 800 bytes
> > k = 7, m = 2 -> 800 bytes
> > k = 8, m = 1 -> No error
> > k = 9, m = 1 -> 800 bytes
> >
> > The "bytes" represent a 'dead zone' object size range wherein objects
> can be
> > put into the cluster but not retrieved.  The range of bytes is 4.5MiB -
> > (4.5MiB - buffer - 1) bytes. Up until k = 9, the error occurs for values
> of
> > k that are not powers of two, at which point the "dead zone" window is
> > (k-2)^2 * 32 bytes.  My team has not been able to determine why we
> plateau
> > at 800 bytes (we expected a range of 1568 bytes here).
> >
> > This issue cannot be reproduced using rados to place objects directly
> into
> > EC pools.  The issue has only been observed with using RadosGW's S3
> > interface.
> >
> > The issue can be reproduced with any S3 client (s3cmd, s3curl, CyberDuck,
> > CloudBerry Backup, and many others have been tested).
> >
> > At this point, we are evaluating the Ceph codebase in an attempt to patch
> > the issue.  As this is an issue affecting data retrievability (and
> possibly
> > integrity), we wanted to bring this to the attention of the community as
> > soon as we could reproduce the issue.  We are hoping both that others out
> > there can independently verify and possibly that some with a more
> intimate
> > understanding of the codebase could investigate and propose a fix.  We
> have
> > observed this issue in our production clusters, so it is a very high
> > priority for my team.
> >
> > Furthermore, we believe the objects to be corrupted at the point they are
> > placed into the cluster.  We have tested copying the .rgw.buckets pool
> to a
> > non-erasure coded pool, then swapping names, and we have found that
> objects
> > copied from the EC pool to the non-EC pool to be irretrievable once RGW
> is
> > pointed to the non-EC pool.  If we overwrite the object in the non-EC
> pool
> > with the original, it becomes retrievable again.  This has not been
> tested
> > as exhaustively, though, but we felt it important enough to mention.
> >
> > I'm sure I've omitted some details here that would aid in an
> investigation,
> > so please let me know what other information I can provide.  My team
> will be
> > filing an issue shortly.
> >
> > Many thanks,
> >
> > Brian Felton
> >
> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Erasure Coded Pools Cause Irretrievable Objects and Possible Corruption

Reply via email to