Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

Yehuda Sadeh-Weinraub Sat, 02 May 2015 09:37:22 -0700


----- Original Message -----
> From: "Sean" <seapasu...@uchicago.edu>
> To: "Yehuda Sadeh-Weinraub" <yeh...@redhat.com>
> Cc: ceph-users@lists.ceph.com
> Sent: Friday, May 1, 2015 6:47:09 PM
> Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; 
> civetweb logs stop after rotation
> 
> Hey there,
> 
> Sorry for the delay. I have been moving apartments UGH. Our dev team
> found out how to quickly identify these files that are downloading a
> smaller size::
> 
> iterate through all of the objects in a bucket and call for a key.size
> in each item and compare it to conn.get_bucket().get_key().size of each
> key and the sizes differ. If the sizes differ these correspond exactly
> to any object that seems to have missing objects in ceph.
> 
> The objects always seem to be intervals of 512k as well which is really
> odd.
> 
> ==================
> http://pastebin.com/R34wF7PB
> ==================
> 
> My main question is why are these sizes different at all? Shouldn't they
> be exactly the same? Why are they off by multiples of 512k as well?
> Finally I need a way to rule out that this is a ceph issue and the only
> way I can think of is grabbing a list of all of the data files and
> concatenating them together in order in hopes that the manifest is wrong
> and I get the whole file.
> 
> For example::
> 
> implicit size 7745820218     explicit size 7744771642    . Absolute
> 1048576; name =
> 86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam
> 
> I explicitly called one of the gateways and then piped the output to a
> text file while downloading this bam:
> 
> https://drive.google.com/file/d/0B16pfLB7yY6GcTZXalBQM3RHT0U/view?usp=sharing
> (25 Mb of text)
> 
> As we can see above. Ceph is saying that the size is  7745820218 bytes
> somewhere but when we download it we get 7744771642 bytes. If I download


There are two different things: the bucket index, and the object manifest. The 
bucket index has the former, and the object manifest specifies the latter.

> the object I get a 7744771642 byte file. Finally if I do a range request
> of all of the bytes from 7744771642 to the end I get a cannot compete
> request::
> 
> 
> http://pastebin.com/CVvmex4m -- traceback of the python range request.
> http://pastebin.com/4sd1Jc0G -- the radoslog of the range request
> 
> If I request the file with a shorter range (say 7744771642 -2 bytes
> (7744771640)) I am left with just a 2 byte file::
> 
> http://pastebin.com/Sn7Y0t9G -- range request of file - 2 bytes to end
> of file.
> lacadmin@kh10-9:~$ ls -lhab 7gtest-range.bam
> -rw-r--r-- 1 lacadmin lacadmin 2 Feb 24 01:00 7gtest-range.bam
> 
> 
> I think that rados-gw may not be keeping track of the multipart chunks
> errors possibly? How did rados get the original and correct file size
> and why is it short when it returns the actual chunks? Finally why are
> the corrupt / missing chunks always a multipe of 512K? I do not see
> anything obvious that is set to 512K on the configuration/user side.
> 
> 
> Sorry for the questions and babling but I am at a loss as to how to
> address this.

So, the question is which is correct, the index, or the object itself. Do you 
have any way to know which one is the correct one? Also, does it only happen to 
you with very large objects? Does it happen with every such object (e.g., > 
4GBs)?

Here's some extra information you could gather:

 - Get the object manifest:

$ radosgw-admin object stat --bucket=<bucket> --object=<object>

 - Get status for each rados object to the corresponding logical rgw object:

First, identify the object names that correspond to this specific rgw object. 
From the manifest you'd get a 'prefix', which is a random hash that all tail 
objects should contain. Then you should do something like:

$ rados -p <data pool, e.g., .rgw.buckets> ls | grep $prefix

And then, for each object:

$ rados -p <data pool, e.g., .rgw.buckets> stat $object

There's also the head object that you'd want to inspect (named after the actual 
rgw object name, grep it too).

HTH,
Yehuda

> 
> 
> 
> 
> 
> On 04/28/2015 05:03 PM, Yehuda Sadeh-Weinraub wrote:
> >
> > ----- Original Message -----
> >> From: "Sean" <seapasu...@uchicago.edu>
> >> To: ceph-users@lists.ceph.com
> >> Sent: Tuesday, April 28, 2015 2:52:35 PM
> >> Subject: [ceph-users] Civet RadosGW S3 not storing complete obects;
> >> civetweb logs stop after rotation
> >>
> >> Hey yall!
> >>
> >> I have a weird issue and I am not sure where to look so any help would
> >> be appreciated. I have a large ceph giant cluster that has been stable
> >> and healthy almost entirely since its inception. We have stored over
> >> 1.5PB into the cluster currently through RGW and everything seems to be
> >> functioning great. We have downloaded smaller objects without issue but
> >> last night we did a test on our largest file (almost 1 terabyte) and it
> >> continuously times out at almost the exact same place. Investigating
> >> further it looks like Civetweb/RGW is returning that the uploads
> >> completed even though the objects are truncated. At least when we
> >> download the objects they seem to be truncated.
> >>
> >> I have tried searching through the mailing list archives to see what may
> >> be going on but it looks like the mailing list DB may be going through
> >> some mainenance:
> >>
> >> ----
> >> Unable to read word database file
> >> '/dh/mailman/dap/archives/private/ceph-users-ceph.com/htdig/db.words.db'
> >> ----
> >>
> >> After checking through the gzipped logs I see that civetweb just stops
> >> logging after a rotation for some reason as well and my last log is from
> >> the 28th of march. I tried manually running /etc/init.d/radosgw reload
> >> but this didn't seem to work. As running the download again could take
> >> all day to error out we instead use the range request to try and pull
> >> the missing bites.
> >>
> >> https://gist.github.com/MurphyMarkW/8e356823cfe00de86a48 -- there is the
> >> code we are using to download via S3 / boto as well as the returned size
> >> report and overview of our issue.
> >> http://pastebin.com/cVLdQBMF-- Here is some of the log from the civetweb
> >> server they are hitting.
> >>
> >> Here is our current config ::
> >> http://pastebin.com/2SGfSDYG
> >>
> >> Current output of ceph health::
> >> http://pastebin.com/3f6iJEbu
> >>
> >> I am thinking that this must be a civetweb/radosgw bug of somekind. My
> >> question is 1.) is there a way to try and download the object via rados
> >> directly I am guessing I will need to find the prefix and then just cat
> >> all of them together and hope I get it right? 2.) Why would ceph say the
> >> upload went fine but then return a smaller object?
> >>
> >>
> >
> > Note that the returned http resonse returns 206 (partial content):
> > /var/log/radosgw/client.radosgw.log:2015-04-28 16:08:26.525268 7f6e93fff700
> > 2 req 0:1.067030:s3:GET
> > /tcga_cghub_protected/ff9b730c-d303-4d49-b28f-e0bf9d8f1c84/759366461d2bf8bb0583d5b9566ce947.bam:get_obj:http
> > status=206
> >
> > It'll only return that if partial content is requested (through the http
> > Range header). It's really hard to tell from these logs whether there's
> > any actual problem. I suggest bumping up the log level (debug ms = 1,
> > debug rgw = 20), and take a look at an entire request (one that include
> > all the request http headers).
> >
> > Yehuda
> >
> >
> >
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

Reply via email to