Hello,
Recently we've observed on one of our ceph clusters that uploading of a large 
number of small files(~2000x2k) fails. The http return code shows 200 but the 
file upload fails. Here is an e.g of the log

2018-06-27 07:34:40.624103 7f0dc67cc700  1 ====== starting new request 
req=0x7f0dc67c68a0 =====
2018-06-27 07:34:40.645039 7f0dc3fc7700  1 ====== starting new request 
req=0x7f0dc3fc18a0 =====
2018-06-27 07:34:40.682108 7f0dc3fc7700  0 WARNING: couldn't find acl header 
for object, generating default
2018-06-27 07:34:40.962674 7f0dcbfd7700  0 ERROR: client_io->complete_request() 
returned -5
2018-06-27 07:34:40.962689 7f0dcbfd7700  1 ====== req done req=0x7f0dcbfd18a0 
op status=0 http_status=200 ======
2018-06-27 07:34:40.962738 7f0dcbfd7700  1 civetweb: 0x7f0df4004160: 10.x.x.x. 
- - [27/Jun/2018:07:34:34 +0000] "POST xxxx-xxxx HTTP/1.1" 200 0 - 
aws-sdk-java/1.6.4 Linux/3.17.6-200.fc20.x86_64 
Java_HotSpot(TM)_64-Bit_Server_VM/25.73-b02

I tried tuning the performance using the below parameters but the number of 
file upload failures still exists so I'm suspecting this is not a concurrency 
issue.
rgw num rados handles = 8
rgw thread pool size = 512
rgw frontends = civetweb port=7480 num_threads=512

I also tried increasing the logging level for rgw and civetweb to 20/5 but I 
dont see anything that can point to the issue.

2018-06-28 18:00:24.575460 7f3d7dfc3700 20 get_obj_state: s->obj_tag was set 
empty
2018-06-28 18:00:24.575491 7f3d7dfc3700 20 get_obj_state: rctx=0x7f3d7dfbcff0 
obj=files:_multipart_xxxx-xxxx.error.2~Rh1AqHvzgCPc0NGWMl-FHE0Y-HvCcmk.1 
state=0x7f3e04024d88 s->prefetch_data=0
2018-06-28 18:00:24.575496 7f3d7dfc3700 20 prepare_atomic_modification: state 
is not atomic. state=0x7f3e04024d88
2018-06-28 18:00:24.575555 7f3d7dfc3700 20 reading from 
default.rgw.data.root:.bucket.meta.files:xxxx-xxxx.6432.11
2018-06-28 18:00:24.575567 7f3d7dfc3700 20 get_system_obj_state: 
rctx=0x7f3d7dfbb5d0 
obj=default.rgw.data.root:.bucket.meta.files:xxxx-xxxx.6432.11 
state=0x7f3e04001228 s->prefetch_data
2018-06-28 18:00:24.575577 7f3d7dfc3700 10 cache get: 
name=default.rgw.data.root+.bucket.meta.files:xxxx-xxxx.6432.11 : hit 
(requested=22, cached=23)
2018-06-28 18:00:24.575586 7f3d7dfc3700 20 get_system_obj_state: s->obj_tag was 
set empty
2018-06-28 18:00:24.575592 7f3d7dfc3700 10 cache get: 
name=default.rgw.data.root+.bucket.meta.files:xxxx-xxxx.6432.11 : hit 
(requested=17, cached=23)
2018-06-28 18:00:24.575614 7f3d7dfc3700 20  bucket index object: 
.dir.xxxx-xxxx.6432.11
2018-06-28 18:00:24.606933 7f3d67796700  2 req 9567:5.505460:s3:POST 
xxxx-xxxx.error:init_multipart:completing
2018-06-28 18:00:24.607025 7f3d67796700  0 ERROR: client_io->complete_request() 
returned -5
2018-06-28 18:00:24.607036 7f3d67796700  2 req 9567:5.505572:s3:POST 
xxxx-xxxx.error:init_multipart:op status=0
2018-06-28 18:00:24.607040 7f3d67796700  2 req 9567:5.505578:s3:POST 
xxxx-xxxx.error:init_multipart:http status=200
2018-06-28 18:00:24.607046 7f3d67796700  1 ====== req done req=0x7f3d677908a0 
op status=0 http_status=200 ======

The cluster is a 12 node Ceph jewel(10.2.10-1~bpo80+1) one. Operating system is 
Debian 8.9
ceph.conf
[global]
fsid = 314d4121-46b1-4433-9bae-fdd2803fc24b
mon_initial_members = ceph-1,ceph-2,ceph-3
mon_host = 10.x.x.x, 10.x.x.x, 10.x.x.x
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 10.x.x.x
osd_journal_size = 10240
osd_mount_options_xfs = rw,noexec,noatime,nodiratime,inode64
osd_pool_default_size = 3
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 900
osd_pool_default_pgp_num = 900
log to syslog = true
err to syslog = true
clog to syslog = true
rgw dns name = xxx.com
rgw num rados handles = 8
rgw thread pool size = 512
rgw frontends = civetweb port=7480 num_threads=512
debug rgw = 20/5
debug civetweb = 20/5

[mon]
mon cluster log to syslog = true


Any idea what the issue could be here?

Thanks
Mel

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to