Hi,

We have two CEPH clusters running on Dumpling 0.67.11 and some of our 
"multipart objects" are incompleted. It seems that some slow requests could 
cause corruption of related S3 objects. Moveover GETs for that objects are 
working without any error messages. There are only HTTP 200 in logs as well as 
no information about problems from popular client tools/libs.

The situation looks very similiar to described in bug #8269, but we are using 
fixed 0.67.11 version:  http://tracker.ceph.com/issues/8269

Regards,

Sławomir Ritter



EXAMPLE#1

slow_request
============
2016-02-23 13:49:58.818640 osd.260 10.176.67.27:6800/688083 2119 : [WRN] 4 slow 
requests, 4 included below; oldest blocked for > 30.727096 secs
2016-02-23 13:49:58.818673 osd.260 10.176.67.27:6800/688083 2120 : [WRN] slow 
request 30.727096 seconds old, received at 2016-02-23 13:49:28.091460: osd_op(c
lient.47792965.0:185007087 
default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2
 [writef
ull 0~524288] 10.ce729ebe e107594) v4 currently waiting for subops from [469,9]


HTTP_500 in apache.log
======================
127.0.0.1 - - [23/Feb/2016:13:49:27 +0100] "PUT 
/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=56
 HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 
Linux/3.13.0-39-generic(syncworker)"
127.0.0.1 - - [23/Feb/2016:13:49:28 +0100] "PUT 
/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57
 HTTP/1.0" 500 751 "-" "Boto/2.31.1 Python/2.7.3 
Linux/3.13.0-39-generic(syncworker)"
127.0.0.1 - - [23/Feb/2016:13:49:58 +0100] "PUT 
/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57
 HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 
Linux/3.13.0-39-generic(syncworker)"
127.0.0.1 - - [23/Feb/2016:13:49:59 +0100] "PUT 
/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=58
 HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 
Linux/3.13.0-39-generic(syncworker)"


Empty RADOS object (real size = 0 bytes), list generated basis on MANIFEST
==========================================================================
found  
default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.56_2
  2097152   ok          2097152       10.7acc9476 (10.1476) [278,142,436] 
[278,142,436]
found  
default.14654.445__multipart_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57
 0         diff        4194304       10.4f5be025 (10.25)   [57,310,428]  
[57,310,428]
found  
default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_1
  4194304   ok          4194304       10.81191602 (10.1602) [441,109,420] 
[441,109,420]
found  
default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2
  2097152   ok          2097152       10.ce729ebe (10.1ebe) [260,469,9]   
[260,469,9]


"Silent" GETs
=============
# object size from headers
$ s3 -u head 
video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv          
                                                     Content-Type: 
binary/octet-stream
Content-Length: 641775701
Server: nginx

# but GETs only 637581397 (641775701 - missing 4194304 = 637581397)
$ s3 -u get 
video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv > 
/tmp/test
$  ls -al /tmp/test
-rw-r--r-- 1 root root 637581397 Feb 23 17:05 /tmp/test

# no error in logs
127.0.0.1 - - [23/Feb/2016:17:05:00 +0100] "GET 
/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv 
HTTP/1.0" 200 637581711 "-" "Mozilla/4.0 (Compatible; s3; libs3 2.0; Linux 
x86_64)"

# wget - retry for missing part, but there is no missing part, so it GETs 
head/tail of the file again....
$ wget 
http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
--2016-02-23 17:10:11--  
http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
Connecting to 127.0.0.1:88... connected.
HTTP request sent, awaiting response... 200 OK
Length: 641775701 (612M) [binary/octet-stream]
Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv'

99% 
[==================================================================================================================>
 ] 637,581,397 63.9M/s   in 9.5s

2016-02-23 17:10:20 (64.1 MB/s) - Connection closed at byte 637581397. Retrying.

--2016-02-23 17:10:21--  (try: 2)  
http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
Connecting to 127.0.0.1:88... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 641775701 (612M), 4194304 (4.0M) remaining [binary/octet-stream]
Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv'

100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>]
 641,775,701 --.-K/s   in 0.007s

2016-02-23 17:10:21 (601 MB/s) - 
`c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv' saved 
[641775701/641775701]



_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to