Logs have been attached to the issue: http://tracker.ceph.com/issues/15745
On Thu, May 5, 2016 at 11:23 AM, Samuel Just <[email protected]> wrote: > Can you reproduce with > > debug ms = 1 > debug objecter = 20 > > on the radosgw side? > -Sam > > On Thu, May 5, 2016 at 8:28 AM, Brian Felton <[email protected]> wrote: > > Greetings, > > > > We are running a number of Ceph clusters in production to provide object > > storage services. We have stumbled upon an issue where objects of > certain > > sizes are irretrievable. The symptoms are very similar to the fix > > referenced here: > > > https://www.redhat.com/archives/rhsa-announce/2015-November/msg00060.html. > > We can put objects into the cluster via s3/radosgw, but we cannot > retrieve > > them (cluster closes the connection without delivering all bytes). > > Unfortunately, this fix does not apply to us, as we are and have always > been > > running Hammer. We've stumbled on a brand-new edge case. > > > > We have produced this issue on the 0.94.3, 0.94.4, and 0.94.6 releases of > > Hammer. > > > > We have produced this issues using three different storage hardware > > configurations -- 5 instances of clusters running 648 6TB OSDs across > nine > > physical nodes, 1 cluster running 30 10GB OSDs across ten VM nodes, and 1 > > cluster running 288 6TB OSDs across four physical nodes. > > > > We have determined that this issue only occurs when using erasure coding > > (we've only tested plugin=jerasure technique=reed_sol_van > > ruleset-failure-domain=host). > > > > Objects of exactly 4.5MiB (4718592 bytes) can be placed into the cluster > but > > not retrieved. At every interval of `rgw object stripe size` thereafter > (in > > our case, 4 MiB), the objects are similarly irretrievable. We have > tested > > this from 4.5 to 24.5 MiB, then have spot-checked for much larger values > to > > prove the pattern holds. There is a small range of bytes less than this > > boundary that are irretrievable. After much testing, we have found this > > boundary to be strongly correlated with the k value in our erasure coded > > pool. We have observed that the m value in the erasure coding has no > effect > > on the window size. We have tested erasure coded values of k from 2 to > 9, > > and we've observed the following ranges: > > > > k = 2, m = 1 -> No error > > k = 3, m = 1 -> 32 bytes (i.e. errors when objects are inclusively > between > > 4718561 - 4718592 bytes) > > k = 3, m = 2 -> 32 bytes > > k = 4, m = 2 -> No error > > k = 4, m = 1 -> No error > > k = 5, m = 4 -> 128 bytes > > k = 6, m = 3 -> 512 bytes > > k = 6, m = 2 -> 512 bytes > > k = 7, m = 1 -> 800 bytes > > k = 7, m = 2 -> 800 bytes > > k = 8, m = 1 -> No error > > k = 9, m = 1 -> 800 bytes > > > > The "bytes" represent a 'dead zone' object size range wherein objects > can be > > put into the cluster but not retrieved. The range of bytes is 4.5MiB - > > (4.5MiB - buffer - 1) bytes. Up until k = 9, the error occurs for values > of > > k that are not powers of two, at which point the "dead zone" window is > > (k-2)^2 * 32 bytes. My team has not been able to determine why we > plateau > > at 800 bytes (we expected a range of 1568 bytes here). > > > > This issue cannot be reproduced using rados to place objects directly > into > > EC pools. The issue has only been observed with using RadosGW's S3 > > interface. > > > > The issue can be reproduced with any S3 client (s3cmd, s3curl, CyberDuck, > > CloudBerry Backup, and many others have been tested). > > > > At this point, we are evaluating the Ceph codebase in an attempt to patch > > the issue. As this is an issue affecting data retrievability (and > possibly > > integrity), we wanted to bring this to the attention of the community as > > soon as we could reproduce the issue. We are hoping both that others out > > there can independently verify and possibly that some with a more > intimate > > understanding of the codebase could investigate and propose a fix. We > have > > observed this issue in our production clusters, so it is a very high > > priority for my team. > > > > Furthermore, we believe the objects to be corrupted at the point they are > > placed into the cluster. We have tested copying the .rgw.buckets pool > to a > > non-erasure coded pool, then swapping names, and we have found that > objects > > copied from the EC pool to the non-EC pool to be irretrievable once RGW > is > > pointed to the non-EC pool. If we overwrite the object in the non-EC > pool > > with the original, it becomes retrievable again. This has not been > tested > > as exhaustively, though, but we felt it important enough to mention. > > > > I'm sure I've omitted some details here that would aid in an > investigation, > > so please let me know what other information I can provide. My team > will be > > filing an issue shortly. > > > > Many thanks, > > > > Brian Felton > > > > > > > > > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
