Hi, We have two CEPH clusters running on Dumpling 0.67.11 and some of our "multipart objects" are incompleted. It seems that some slow requests could cause corruption of related S3 objects. Moveover GETs for that objects are working without any error messages. There are only HTTP 200 in logs as well as no information about problems from popular client tools/libs.
The situation looks very similiar to described in bug #8269, but we are using fixed 0.67.11 version: http://tracker.ceph.com/issues/8269 Regards, Sławomir Ritter EXAMPLE#1 slow_request ============ 2016-02-23 13:49:58.818640 osd.260 10.176.67.27:6800/688083 2119 : [WRN] 4 slow requests, 4 included below; oldest blocked for > 30.727096 secs 2016-02-23 13:49:58.818673 osd.260 10.176.67.27:6800/688083 2120 : [WRN] slow request 30.727096 seconds old, received at 2016-02-23 13:49:28.091460: osd_op(c lient.47792965.0:185007087 default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2 [writef ull 0~524288] 10.ce729ebe e107594) v4 currently waiting for subops from [469,9] HTTP_500 in apache.log ====================== 127.0.0.1 - - [23/Feb/2016:13:49:27 +0100] "PUT /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=56 HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 Linux/3.13.0-39-generic(syncworker)" 127.0.0.1 - - [23/Feb/2016:13:49:28 +0100] "PUT /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57 HTTP/1.0" 500 751 "-" "Boto/2.31.1 Python/2.7.3 Linux/3.13.0-39-generic(syncworker)" 127.0.0.1 - - [23/Feb/2016:13:49:58 +0100] "PUT /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57 HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 Linux/3.13.0-39-generic(syncworker)" 127.0.0.1 - - [23/Feb/2016:13:49:59 +0100] "PUT /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=58 HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 Linux/3.13.0-39-generic(syncworker)" Empty RADOS object (real size = 0 bytes), list generated basis on MANIFEST ========================================================================== found default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.56_2 2097152 ok 2097152 10.7acc9476 (10.1476) [278,142,436] [278,142,436] found default.14654.445__multipart_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57 0 diff 4194304 10.4f5be025 (10.25) [57,310,428] [57,310,428] found default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_1 4194304 ok 4194304 10.81191602 (10.1602) [441,109,420] [441,109,420] found default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2 2097152 ok 2097152 10.ce729ebe (10.1ebe) [260,469,9] [260,469,9] "Silent" GETs ============= # object size from headers $ s3 -u head video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv Content-Type: binary/octet-stream Content-Length: 641775701 Server: nginx # but GETs only 637581397 (641775701 - missing 4194304 = 637581397) $ s3 -u get video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv > /tmp/test $ ls -al /tmp/test -rw-r--r-- 1 root root 637581397 Feb 23 17:05 /tmp/test # no error in logs 127.0.0.1 - - [23/Feb/2016:17:05:00 +0100] "GET /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv HTTP/1.0" 200 637581711 "-" "Mozilla/4.0 (Compatible; s3; libs3 2.0; Linux x86_64)" # wget - retry for missing part, but there is no missing part, so it GETs head/tail of the file again.... $ wget http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv --2016-02-23 17:10:11-- http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv Connecting to 127.0.0.1:88... connected. HTTP request sent, awaiting response... 200 OK Length: 641775701 (612M) [binary/octet-stream] Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv' 99% [==================================================================================================================> ] 637,581,397 63.9M/s in 9.5s 2016-02-23 17:10:20 (64.1 MB/s) - Connection closed at byte 637581397. Retrying. --2016-02-23 17:10:21-- (try: 2) http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv Connecting to 127.0.0.1:88... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 641775701 (612M), 4194304 (4.0M) remaining [binary/octet-stream] Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv' 100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>] 641,775,701 --.-K/s in 0.007s 2016-02-23 17:10:21 (601 MB/s) - `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv' saved [641775701/641775701]
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
