Hi,
We got a ceph deployment 13.2.5 version, but several bucket with millions of
files.
services:
mon: 3 daemons, quorum CEPH001,CEPH002,CEPH003
mgr: CEPH001(active)
osd: 106 osds: 106 up, 106 in
rgw: 2 daemons active
data:
pools: 17 pools, 7120 pgs
objects: 106.8 M objects, 271 TiB
usage: 516 TiB used, 102 TiB / 619 TiB avail
pgs: 7120 active+clean
We done a test in a spare RGW server for this case.
Customer report us that is unable to list their buckets, we tested in a
monitor with the command:
s3cmd ls s3://[bucket] --no-ssl --limit 20
Takes 1m and 2 secs.
RGW log in debug mode = 2
2019-05-03 10:40:25.449 7f65f63e1700 1 ====== starting new request
req=0x55eba26e8970 =====
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s::GET
/[bucketname]/::initializing for trans_id =
tx000000000000000000071-005ccbfe79-e6283e-default
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/::getting op 0
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:verifying requester
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:normalizing buckets and tenants
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:init permissions
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:recalculating target
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:reading permissions
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:init op
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:verifying op mask
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:verifying op permissions
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:verifying op params
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:pre-executing
2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:executing
2019-05-03 10:40:41.026 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:41:03.026 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:41:25.026 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:41:47.026 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:41:49.395 7f65f63e1700 2 req 113:83.9461s:s3:GET
/[bucketname]/:list_bucket:completing
2019-05-03 10:41:49.395 7f65f63e1700 2 req 113:83.9461s:s3:GET
/[bucketname]/:list_bucket:op status=0
2019-05-03 10:41:49.395 7f65f63e1700 2 req 113:83.9461s:s3:GET
/[bucketname]/:list_bucket:http status=200
2019-05-03 10:41:49.395 7f65f63e1700 1 ====== req done req=0x55eba26e8970
op status=0 http_status=200 ======
time s3cmd ls s3://[bucket] --no-ssl --limit 100
real 4m26.318s
2019-05-03 10:42:36.439 7f65f33db700 1 ====== starting new request
req=0x55eba26e8970 =====
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s::GET
/[bucketname]/::initializing for trans_id =
tx000000000000000000073-005ccbfefc-e6283e-default
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/::getting op 0
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:verifying requester
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:normalizing buckets and tenants
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:init permissions
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:recalculating target
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:reading permissions
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:init op
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:verifying op mask
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:verifying op permissions
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:verifying op params
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:pre-executing
2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:executing
2019-05-03 10:42:53.026 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:43:15.027 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:43:37.028 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:43:59.027 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:44:21.028 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:44:43.027 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:45:05.027 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:45:18.260 7f660cc0e700 2 object expiration: start
2019-05-03 10:45:18.779 7f660cc0e700 2 object expiration: stop
2019-05-03 10:45:27.027 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:45:49.027 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:46:11.027 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:46:33.027 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:46:55.028 7f660e411700 2
RGWDataChangesLog::ChangesRenewThread: start
2019-05-03 10:47:02.092 7f65f33db700 2 req 115:265.652s:s3:GET
/[bucketname]/:list_bucket:completing
2019-05-03 10:47:02.092 7f65f33db700 2 req 115:265.652s:s3:GET
/[bucketname]/:list_bucket:op status=0
2019-05-03 10:47:02.092 7f65f33db700 2 req 115:265.652s:s3:GET
/[bucketname]/:list_bucket:http status=200
2019-05-03 10:47:02.092 7f65f33db700 1 ====== req done req=0x55eba26e8970
op status=0 http_status=200 ======
radosgw-admin bucket limit check
}
"bucket": "[BUCKETNAME]",
"tenant": "",
"num_objects": 7126133,
"num_shards": 128,
"objects_per_shard": 55672,
"fill_status": "OK"
},
We 'realy don't know who to solve that , looks like a timeout or slow
performance for that bucket.
Our RGW section in ceph.conf
[client.rgw.ceph-rgw01]
host = ceph-rgw01
rgw enable usage log = true
rgw dns name = XXXXXX
rgw frontends = "beast port=7480"
rgw resolve cname = false
rgw thread pool size = 128
rgw num rados handles = 1
rgw op thread timeout = 120
[client.rgw.ceph-rgw03]
host = ceph-rgw03
rgw enable usage log = true
rgw dns name = XXXXXXXX
rgw frontends = "beast port=7480"
rgw resolve cname = false
rgw thread pool size = 640
rgw num rados handles = 16
rgw op thread timeout = 120
Best Regards,
Manuel
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com