Hi!
> By looking at these logs it seems that there are only 8 pgs on the
> .rgw pool, if this is correct then you may want to change that
> considering your workload.
Thanks. See out pg_num configuration below. We had already suspected
that the 1600 that we had previously (48 OSDs * 100 / triple redundancy)
were not ideal, so we increased the .rgw.buckets pool to 2048.
The number of objects and their size was in an earlier email, but for
completeness I will put them up once again.
Any other ideas where to look?
==============
for i in $(rados df | awk '{ print $1 }' | grep '^\.'); do
echo $i; echo -n " - “;
ceph osd pool get $i pg_num;
echo -n " - “;
ceph osd pool get $i pgp_num;
done
.intent-log
- pg_num: 1600
- pgp_num: 1600
.log
- pg_num: 1600
- pgp_num: 1600
.rgw
- pg_num: 1600
- pgp_num: 1600
.rgw.buckets
- pg_num: 2048
- pgp_num: 2048
.rgw.buckets.index
- pg_num: 1600
- pgp_num: 1600
.rgw.control
- pg_num: 1600
- pgp_num: 1600
.rgw.gc
- pg_num: 1600
- pgp_num: 1600
.rgw.root
- pg_num: 100
- pgp_num: 100
.usage
- pg_num: 1600
- pgp_num: 1600
.users
- pg_num: 1600
- pgp_num: 1600
.users.email
- pg_num: 1600
- pgp_num: 1600
.users.swift
- pg_num: 1600
- pgp_num: 1600
.users.uid
- pg_num: 1600
- pgp_num: 1600
===============
> .rgw
> =========================
> KB: 1,966,932
> objects: 9,094,552
> rd: 195,747,645
> rd KB: 153,585,472
> wr: 30,191,844
> wr KB: 10,751,065
>
> .rgw.buckets
> =========================
> KB: 2,038,313,855
> objects: 22,088,103
> rd: 5,455,123
> rd KB: 408,416,317
> wr: 149,377,728
> wr KB: 1,882,517,472
>
> .rgw.buckets.index
> =========================
> KB: 0
> objects: 5,374,376
> rd: 267,996,778
> rd KB: 262,626,106
> wr: 107,142,891
> wr KB: 0
>
> .rgw.control
> =========================
> KB: 0
> objects: 8
> rd: 0
> rd KB: 0
> wr: 0
> wr KB: 0
>
> .rgw.gc
> =========================
> KB: 0
> objects: 32
> rd: 5,554,407
> rd KB: 5,713,942
> wr: 8,355,934
> wr KB: 0
>
> .rgw.root
> =========================
> KB: 1
> objects: 3
> rd: 524
> rd KB: 346
> wr: 3
> wr KB: 3
Daniel
> On 08 Oct 2014, at 01:03, Yehuda Sadeh <[email protected]> wrote:
>
> This operation stalled quite a bit, seems that it was waiting for the osd:
>
> 2.547155 7f036ffc7700 1 -- 10.102.4.11:0/1009401 -->
> 10.102.4.14:6809/7428 -- osd_op(client.78418684.0:27514711
> .bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043 [call
> version.read,getxattrs,stat] 5.3b7d1197 ack+read e16034) v4 -- ?+0
> 0x7f026802e2c0 con 0x7f040c055ca0
> ...
> 7.619750 7f041ddf4700 1 -- 10.102.4.11:0/1009401 <== osd.32
> 10.102.4.14:6809/7428 208252 ==== osd_op_reply(27514711
> .bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043
> [call,getxattrs,stat] v0'0 uv6371 ondisk = 0) v6 ==== 338+0+336
> (3685145659 0 4232894755) 0x7f00e430f540 con 0x7f040c055ca0
>
> By looking at these logs it seems that there are only 8 pgs on the
> .rgw pool, if this is correct then you may want to change that
> considering your workload.
>
> Yehuda
>
>
> On Tue, Oct 7, 2014 at 3:46 PM, Daniel Schneller
> <[email protected]> wrote:
>> Hi!
>>
>> Sorry, I must have missed the enabling of that debug module.
>> However, the test setup has been the same all the time -
>> I only have the one test-application :)
>>
>> But maybe I phrased it a bit ambiguously when I wrote
>>
>>> It then continuously created containers - both empty
>>> and such with 10 objects of around 100k random data in them.
>>
>> 100 kilobytes is the size of a single object, of which we create 10
>> per container. The container gets created first, without any
>> objects, naturally, then 10 objects are added. One of these objects
>> is called “version”, the rest have generated names with a fixed
>> prefix and appended 1-9. The version object is the one I picked
>> for the example logs I sent earlier.
>>
>> I hope this makes the setup clearer.
>>
>> Attached you will find the (now more extensive) logs for the outliers
>> again. As you did not say that I garbled the logs, I assume the
>> pre-processing was OK, so I have prepared the new data in a similar
>> fashion, marking the relevant request with <CNT-UUID-FINDME>.
>>
>> I have not removed any lines in between the beginning of the
>> “interesting” request and its completion to keep all the network
>> traffic log intact. Due to the increased verbosity, I will not post
>> the logs inline, but only attach them gzipped.
>>
>> As before, should the full data set be needed, I can provide
>> an archived version.
>>
>>
>>
>>
>> Thanks for your support!
>> Daniel
>>
>>
>>
>>
>>> On 07 Oct 2014, at 22:45, Yehuda Sadeh <[email protected]> wrote:
>>>
>>> The logs here don't include the messenger (debug ms = 1). It's hard to
>>> tell what going on from looking at the outliers. Also, in your
>>> previous mail you described a different benchmark, you tested writing
>>> large number of objects into a single bucket, whereas in this test
>>> you're testing multiple bucket creations, which have a completely
>>> different characteristics.
>>>
>>>
>>> On Tue, Oct 7, 2014 at 1:26 PM, Daniel Schneller
>>> <[email protected]> wrote:
>>>> Hi!
>>>>
>>>> I have re-run our test as follows:
>>>>
>>>> * 4 Rados Gateways, on 4 baremetal machines which have
>>>> a total of 48 spinning rust OSDs.
>>>>
>>>> * Benchmark run on a virtual machine talking to HAProxy
>>>> which balances the requests across the 4 Rados GWs.
>>>>
>>>> * Three instances of the benchmark run in parallel. Each
>>>> instance creates 1000 containers, puts 11 objects into
>>>> each container. Once they have all been created, each
>>>> instances deletes its own containers again.
>>>>
>>>> I configured one of the radosgws with the debug levels
>>>> you requested. The tests produced quite an amount of data
>>>> (approx. 1GB of text), so I took the liberty to
>>>> pre-process that a bit.
>>>>
>>>> In this run we landed at around 1.2s per container
>>>> created (including the objects in them) on average.
>>>> This was slightly better than the 1.6s we witnessed
>>>> before, but that test ran for much longer, so there might
>>>> have been some queue-up effect.
>>>>
>>>> Interestingly enough the average is actually somewhat
>>>> misleading. The logs below show the creation of one
>>>> object in a container each, one being the fastest of this
>>>> benchmark (at least on the debug-enabled radosgw), one
>>>> being the slowest.
>>>>
>>>> The fastest one was completed in 0.013s, the longest one
>>>> took 4.93s(!).
>>>>
>>>> The attached logs are cleaned up so that they each show
>>>> just a single request and replaced longish, but constant
>>>> information with placeholders. Our container names are
>>>> of the form “stresstest-xxxxxxxxxxx” which I shortened
>>>> to “<CNT-UUID>” for brevity. Also, I removed the redundant
>>>> prefix (date, hour, minute of day).
>>>>
>>>> The column before the log level looked like a thread-id.
>>>> As I focused on a single request, I removed all the lines
>>>> that did not match the same id, replacing the actual value
>>>> with “<ID>”. That makes the logs much easier to read and
>>>> understand.
>>>>
>>>> Just in case I might have removed too much information
>>>> for the logs to be useful, the complete log is available
>>>> in BZIP2 compressed form for download. Just let me know
>>>> if you need it, then I will provide a link via direct email.
>>>>
>>>> To me it seems like there might indeed be a contention
>>>> issue. It would be interesting to know, if this is correct
>>>> and if there are any settings that we could adjust to
>>>> alleviate the issue.
>>>>
>>>> Daniel
>>>>
>>>> ==============================
>>>>
>>>> ➜ ~ cat rados_shortest.txt
>>>> 21.431185 <ID> 20 QUERY_STRING=page=swift¶ms=/v1/<CNT-UUID>/version
>>>> 21.431187 <ID> 20 REMOTE_ADDR=10.102.8.140
>>>> 21.431188 <ID> 20 REMOTE_PORT=44007
>>>> 21.431189 <ID> 20 REQUEST_METHOD=PUT
>>>> 21.431190 <ID> 20 REQUEST_SCHEME=https
>>>> 21.431191 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version
>>>> 21.431192 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
>>>> 21.431193 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version
>>>> 21.431194 <ID> 20
>>>> SCRIPT_URI=https://localhost:8405/swift/v1/<CNT-UUID>/version
>>>> 21.431195 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version
>>>> 21.431196 <ID> 20 SERVER_ADDR=10.102.9.11
>>>> 21.431197 <ID> 20 SERVER_ADMIN=[no address given]
>>>> 21.431198 <ID> 20 SERVER_NAME=localhost
>>>> 21.431199 <ID> 20 SERVER_PORT=8405
>>>> 21.431200 <ID> 20 SERVER_PORT_SECURE=443
>>>> 21.431201 <ID> 20 SERVER_PROTOCOL=HTTP/1.1
>>>> 21.431202 <ID> 20 SERVER_SIGNATURE=
>>>> 21.431203 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu)
>>>> 21.431205 <ID> 1 ====== starting new request req=0x7f038c019e90 =====
>>>> 21.431219 <ID> 2 req 980641:0.000015::PUT
>>>> /swift/v1/<CNT-UUID>/version::initializing
>>>> 21.431259 <ID> 10 ver=v1 first=<CNT-UUID> req=version
>>>> 21.431265 <ID> 10 s->object=version s->bucket=<CNT-UUID>
>>>> 21.431269 <ID> 2 req 980641:0.000065:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version::getting op
>>>> 21.431274 <ID> 2 req 980641:0.000070:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:authorizing
>>>> 21.431321 <ID> 20 get_obj_state: rctx=0x7f030800b720
>>>> obj=.users.swift:documentstore:swift state=0x7f03080f31e8
>>>> s->prefetch_data=0
>>>> 21.431332 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>>> 21.431338 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 21.431344 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>>> 21.431369 <ID> 20 get_obj_state: rctx=0x7f030800b720
>>>> obj=.users.uid:documentstore state=0x7f03080f31e8 s->prefetch_data=0
>>>> 21.431374 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>>> 21.431378 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 21.431382 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>>> 21.431401 <ID> 10 swift_user=documentstore:swift
>>>> 21.431416 <ID> 20 build_token
>>>> token=13000000646f63756d656e7473746f72653a737769667406a4b2ba3999f8a84f45355438d8ff17
>>>> 21.431467 <ID> 2 req 980641:0.000262:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:reading permissions
>>>> 21.431493 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=.rgw:<CNT-UUID>
>>>> state=0x7f03080f31e8 s->prefetch_data=0
>>>> 21.431508 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss
>>>> (requested=22,
>>>> cached=19)
>>>> 21.433081 <ID> 10 cache put: name=.rgw+<CNT-UUID>
>>>> 21.433106 <ID> 10 removing entry:
>>>> name=.rgw+stresstest-ab9ee3e2-dcf5-4a5b-ab40-931d94c7784038242 from cache
>>>> LRU
>>>> 21.433114 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end
>>>> 21.433120 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 21.433122 <ID> 20 Read xattr: user.rgw.idtag
>>>> 21.433124 <ID> 20 Read xattr: user.rgw.manifest
>>>> 21.433129 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit
>>>> 21.433141 <ID> 20 rgw_get_bucket_info: bucket instance:
>>>> <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.118911])
>>>> 21.433148 <ID> 20 reading from
>>>> .rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911
>>>> 21.433169 <ID> 20 get_obj_state: rctx=0x7f03837ed250
>>>> obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911
>>>> state=0x7f0308005778 s->prefetch_data=0
>>>> 21.433185 <ID> 10 cache get:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : type miss
>>>> (requested=22, cached=19)
>>>> 21.434632 <ID> 10 cache put:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911
>>>> 21.434650 <ID> 10 moving
>>>> .rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 to cache LRU end
>>>> 21.434657 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177
>>>> 21.434664 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 21.434667 <ID> 20 Read xattr: user.rgw.acl
>>>> 21.434668 <ID> 20 Read xattr: user.rgw.idtag
>>>> 21.434669 <ID> 20 Read xattr: user.rgw.manifest
>>>> 21.434675 <ID> 10 cache get:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : hit
>>>> 21.434701 <ID> 15 Read AccessControlPolicy<AccessControlPolicy
>>>> xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>documentstore</ID><DisplayName>Document
>>>> Store</DisplayName></Owner><AccessControlList><Grant><Grantee
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document
>>>> Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
>>>> 21.434710 <ID> 2 req 980641:0.003506:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:init op
>>>> 21.434716 <ID> 2 req 980641:0.003512:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask
>>>> 21.434719 <ID> 20 required_mask= 2 user.op_mask=7
>>>> 21.434720 <ID> 2 req 980641:0.003516:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions
>>>> 21.434723 <ID> 5 Searching permissions for uid=documentstore mask=50
>>>> 21.434725 <ID> 5 Found permission: 15
>>>> 21.434726 <ID> 5 Searching permissions for group=1 mask=50
>>>> 21.434727 <ID> 5 Permissions for group not found
>>>> 21.434729 <ID> 5 Searching permissions for group=2 mask=50
>>>> 21.434730 <ID> 5 Permissions for group not found
>>>> 21.434731 <ID> 5 Getting permissions id=documentstore owner=documentstore
>>>> perm=2
>>>> 21.434732 <ID> 10 uid=documentstore requested perm (type)=2, policy
>>>> perm=2,
>>>> user_perm_mask=2, acl perm=2
>>>> 21.434734 <ID> 2 req 980641:0.003530:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op params
>>>> 21.434737 <ID> 2 req 980641:0.003533:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:executing
>>>> 21.434850 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=<CNT-UUID>:version
>>>> state=0x7f03080c5378 s->prefetch_data=0
>>>> 21.436144 <ID> 10 setting object write_tag=default.78418684.980641
>>>> 21.444613 <ID> 2 req 980641:0.013408:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:http status=201
>>>>
>>>> ==============================
>>>>
>>>> ➜ ~ cat rados_longest.txt
>>>> 31.886128 <ID> 20 QUERY_STRING=page=swift¶ms=/v1/<CNT-UUID>/version
>>>> 31.886129 <ID> 20 REMOTE_ADDR=10.102.8.140
>>>> 31.886130 <ID> 20 REMOTE_PORT=46714
>>>> 31.886131 <ID> 20 REQUEST_METHOD=PUT
>>>> 31.886132 <ID> 20 REQUEST_SCHEME=https
>>>> 31.886134 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version
>>>> 31.886135 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
>>>> 31.886136 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version
>>>> 31.886137 <ID> 20
>>>> SCRIPT_URI=https://localhost:8405/swift/v1/<CNT-UUID>/version
>>>> 31.886138 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version
>>>> 31.886139 <ID> 20 SERVER_ADDR=10.102.9.11
>>>> 31.886140 <ID> 20 SERVER_ADMIN=[no address given]
>>>> 31.886141 <ID> 20 SERVER_NAME=localhost
>>>> 31.886143 <ID> 20 SERVER_PORT=8405
>>>> 31.886144 <ID> 20 SERVER_PORT_SECURE=443
>>>> 31.886145 <ID> 20 SERVER_PROTOCOL=HTTP/1.1
>>>> 31.886146 <ID> 20 SERVER_SIGNATURE=
>>>> 31.886147 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu)
>>>> 31.886148 <ID> 1 ====== starting new request req=0x7f038c024c50 =====
>>>> 31.886162 <ID> 2 req 983095:0.000013::PUT
>>>> /swift/v1/<CNT-UUID>/version::initializing
>>>> 31.886195 <ID> 10 ver=v1 first=<CNT-UUID> req=version
>>>> 31.886200 <ID> 10 s->object=version s->bucket=<CNT-UUID>
>>>> 31.886203 <ID> 2 req 983095:0.000055:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version::getting op
>>>> 31.886208 <ID> 2 req 983095:0.000060:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:authorizing
>>>> 31.886242 <ID> 20 get_obj_state: rctx=0x7f02b0062770
>>>> obj=.users.swift:documentstore:swift state=0x7f02b007ac18
>>>> s->prefetch_data=0
>>>> 31.886250 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>>> 31.886255 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 31.886260 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>>> 31.886297 <ID> 20 get_obj_state: rctx=0x7f02b0062770
>>>> obj=.users.uid:documentstore state=0x7f02b007ac18 s->prefetch_data=0
>>>> 31.886303 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>>> 31.886308 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 31.886312 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>>> 31.886329 <ID> 10 swift_user=documentstore:swift
>>>> 31.886343 <ID> 20 build_token
>>>> token=13000000646f63756d656e7473746f72653a737769667410f2006ed65dbbaa4f453554207a0f1f
>>>> 31.886393 <ID> 2 req 983095:0.000245:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:reading permissions
>>>> 31.886419 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=.rgw:<CNT-UUID>
>>>> state=0x7f02b007ac18 s->prefetch_data=0
>>>> 31.886430 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss
>>>> (requested=22,
>>>> cached=19)
>>>> 36.746327 <ID> 10 cache put: name=.rgw+<CNT-UUID>
>>>> 36.746404 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end
>>>> 36.746426 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 36.746431 <ID> 20 Read xattr: user.rgw.idtag
>>>> 36.746433 <ID> 20 Read xattr: user.rgw.manifest
>>>> 36.746452 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit
>>>> 36.746481 <ID> 20 rgw_get_bucket_info: bucket instance:
>>>> <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.119116])
>>>> 36.746491 <ID> 20 reading from
>>>> .rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116
>>>> 36.746549 <ID> 20 get_obj_state: rctx=0x7f0378fd8250
>>>> obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116
>>>> state=0x7f02b00ce638 s->prefetch_data=0
>>>> 36.746585 <ID> 10 cache get:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : type miss
>>>> (requested=22, cached=19)
>>>> 36.747938 <ID> 10 cache put:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116
>>>> 36.747955 <ID> 10 moving
>>>> .rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 to cache LRU end
>>>> 36.747963 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177
>>>> 36.747972 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 36.747975 <ID> 20 Read xattr: user.rgw.acl
>>>> 36.747977 <ID> 20 Read xattr: user.rgw.idtag
>>>> 36.747978 <ID> 20 Read xattr: user.rgw.manifest
>>>> 36.747985 <ID> 10 cache get:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : hit
>>>> 36.748025 <ID> 15 Read AccessControlPolicy<AccessControlPolicy
>>>> xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>documentstore</ID><DisplayName>Document
>>>> Store</DisplayName></Owner><AccessControlList><Grant><Grantee
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document
>>>> Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
>>>> 36.748037 <ID> 2 req 983095:4.861888:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:init op
>>>> 36.748043 <ID> 2 req 983095:4.861895:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask
>>>> 36.748046 <ID> 20 required_mask= 2 user.op_mask=7
>>>> 36.748050 <ID> 2 req 983095:4.861902:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions
>>>> 36.748054 <ID> 5 Searching permissions for uid=documentstore mask=50
>>>> 36.748056 <ID> 5 Found permission: 15
>>>> 36.748058 <ID> 5 Searching permissions for group=1 mask=50
>>>> 36.748060 <ID> 5 Permissions for group not found
>>>> 36.748061 <ID> 5 Searching permissions for group=2 mask=50
>>>> 36.748063 <ID> 5 Permissions for group not found
>>>> 36.748064 <ID> 5 Getting permissions id=documentstore owner=documentstore
>>>> perm=2
>>>> 36.748066 <ID> 10 uid=documentstore requested perm (type)=2, policy
>>>> perm=2,
>>>> user_perm_mask=2, acl perm=2
>>>> 36.748069 <ID> 2 req 983095:4.861921:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op params
>>>> 36.748072 <ID> 2 req 983095:4.861924:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:executing
>>>> 36.748200 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=<CNT-UUID>:version
>>>> state=0x7f02b0042618 s->prefetch_data=0
>>>> 36.802077 <ID> 10 setting object write_tag=default.78418684.983095
>>>> 36.818727 <ID> 2 req 983095:4.932579:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:http status=201
>>>>
>>>> ==============================
>>>>
>>>>
>>>> --
>>>> Daniel Schneller
>>>> Mobile Development Lead
>>>>
>>>> CenterDevice GmbH | Merscheider Straße 1
>>>> | 42699 Solingen
>>>> tel: +49 1754155711 | Deutschland
>>>> [email protected] | www.centerdevice.com
>>>>
>>>>
>>>>
>>>>
>>>> On 06 Oct 2014, at 19:26, Yehuda Sadeh <[email protected]> wrote:
>>>>
>>>> It'd be interesting to see which rados operation is slowing down the
>>>> requests. Can you provide a log dump of a request (with 'debug rgw =
>>>> 20', and 'debug ms = 1'). This might give us a better idea as to
>>>> what's going on.
>>>>
>>>> Thanks,
>>>> Yehuda
>>>>
>>>> On Mon, Oct 6, 2014 at 10:05 AM, Daniel Schneller
>>>> <[email protected]> wrote:
>>>>
>>>> Hi again!
>>>>
>>>> We have done some tests regarding the limits of storing lots and
>>>> lots of buckets through Rados Gateway into Ceph.
>>>>
>>>> Our test used a single user for which we removed the default max
>>>> buckets limit. It then continuously created containers - both empty
>>>> and such with 10 objects of around 100k random data in them.
>>>>
>>>> With 3 parallel processes we saw relatively consistent time of
>>>> about 500-700ms per such container.
>>>>
>>>> This kept steady until we reached approx. 3 million containers
>>>> after which the time per insert sharply went up to currently
>>>> around 1600ms and rising. Due to some hiccups with network
>>>> equipment the tests were aborted a few times, but then resumed without
>>>> deleting any of the previous runs created containers, so the actual
>>>> number might be 2.8 or 3.2 million, but still in that ballpark.
>>>> We aborted the test here.
>>>>
>>>> Judging by the advice given earlier (see quoted mail below) that
>>>> we might hit a limit on some per-user data structures, we created
>>>> another user account, removed its max-bucket limit as well and
>>>> restarted the benchmark with that one, _expecting_ the times to be
>>>> down to the original range of 500-700ms.
>>>>
>>>> However, what we are seeing is that the times stay at the 1600ms
>>>> and higher levels even for that fresh account.
>>>>
>>>> Here is the output of `rados df`, reformatted to fit the email.
>>>> clones, degraded and unfound were 0 in all cases and have been
>>>> left out for clarity:
>>>>
>>>> .rgw
>>>> =========================
>>>> KB: 1,966,932
>>>> objects: 9,094,552
>>>> rd: 195,747,645
>>>> rd KB: 153,585,472
>>>> wr: 30,191,844
>>>> wr KB: 10,751,065
>>>>
>>>> .rgw.buckets
>>>> =========================
>>>> KB: 2,038,313,855
>>>> objects: 22,088,103
>>>> rd: 5,455,123
>>>> rd KB: 408,416,317
>>>> wr: 149,377,728
>>>> wr KB: 1,882,517,472
>>>>
>>>> .rgw.buckets.index
>>>> =========================
>>>> KB: 0
>>>> objects: 5,374,376
>>>> rd: 267,996,778
>>>> rd KB: 262,626,106
>>>> wr: 107,142,891
>>>> wr KB: 0
>>>>
>>>> .rgw.control
>>>> =========================
>>>> KB: 0
>>>> objects: 8
>>>> rd: 0
>>>> rd KB: 0
>>>> wr: 0
>>>> wr KB: 0
>>>>
>>>> .rgw.gc
>>>> =========================
>>>> KB: 0
>>>> objects: 32
>>>> rd: 5,554,407
>>>> rd KB: 5,713,942
>>>> wr: 8,355,934
>>>> wr KB: 0
>>>>
>>>> .rgw.root
>>>> =========================
>>>> KB: 1
>>>> objects: 3
>>>> rd: 524
>>>> rd KB: 346
>>>> wr: 3
>>>> wr KB: 3
>>>>
>>>>
>>>> We would very much like to understand what is going on here
>>>> in order to decide if Rados Gateway is a viable option to base
>>>> our production system on (where we expect similar counts
>>>> as in the benchmark), or if we need to investigate using librados
>>>> directly which we would like to avoid if possible.
>>>>
>>>> Any advice on what configuration parameters to check or
>>>> which additional information to provide to analyze this would be
>>>> very much welcome.
>>>>
>>>> Cheers,
>>>> Daniel
>>>>
>>>>
>>>> --
>>>> Daniel Schneller
>>>> Mobile Development Lead
>>>>
>>>> CenterDevice GmbH | Merscheider Straße 1
>>>> | 42699 Solingen
>>>> tel: +49 1754155711 | Deutschland
>>>> [email protected] | www.centerdevice.com
>>>>
>>>>
>>>>
>>>>
>>>> On 10 Sep 2014, at 19:42, Gregory Farnum <[email protected]> wrote:
>>>>
>>>> On Wednesday, September 10, 2014, Daniel Schneller
>>>> <[email protected]> wrote:
>>>>
>>>>
>>>> On 09 Sep 2014, at 21:43, Gregory Farnum <[email protected]> wrote:
>>>>
>>>>
>>>> Yehuda can talk about this with more expertise than I can, but I think
>>>> it should be basically fine. By creating so many buckets you're
>>>> decreasing the effectiveness of RGW's metadata caching, which means
>>>>
>>>> the initial lookup in a particular bucket might take longer.
>>>>
>>>>
>>>> Thanks for your thoughts. With “initial lookup in a particular bucket”
>>>> do you mean accessing any of the objects in a bucket? If we directly
>>>> access the object (not enumerating the buckets content), would that
>>>> still be an issue?
>>>> Just trying to understand the inner workings a bit better to make
>>>> more educated guesses :)
>>>>
>>>>
>>>>
>>>> When doing an object lookup, the gateway combines the "bucket ID" with a
>>>> mangled version of the object name to try and do a read out of RADOS. It
>>>> first needs to get that bucket ID though -- it will cache an the bucket
>>>> name->ID mapping, but if you have a ton of buckets there could be enough
>>>> entries to degrade the cache's effectiveness. (So, you're more likely to
>>>> pay
>>>> that extra disk access lookup.)
>>>>
>>>>
>>>>
>>>>
>>>> The big concern is that we do maintain a per-user list of all their
>>>> buckets — which is stored in a single RADOS object — so if you have an
>>>> extreme number of buckets that RADOS object could get pretty big and
>>>> become a bottleneck when creating/removing/listing the buckets. You
>>>>
>>>>
>>>> Alright. Listing buckets is no problem, that we don’t do. Can you
>>>> say what “pretty big” would be in terms of MB? How much space does a
>>>> bucket record consume in there? Based on that I could run a few numbers.
>>>>
>>>>
>>>>
>>>> Uh, a kilobyte per bucket? You could look it up in the source (I'm on my
>>>> phone) but I *believe* the bucket name is allowed to be larger than the
>>>> rest
>>>> combined...
>>>> More particularly, though, if you've got a single user uploading documents,
>>>> each creating a new bucket, then those bucket creates are going to
>>>> serialize
>>>> on this one object.
>>>> -Greg
>>>>
>>>>
>>>>
>>>>
>>>> should run your own experiments to figure out what the limits are
>>>> there; perhaps you have an easy way of sharding up documents into
>>>> different users.
>>>>
>>>>
>>>> Good advice. We can do that per distributor (an org unit in our
>>>> software) to at least compartmentalize any potential locking issues
>>>> in this area to that single entity. Still, there would be quite
>>>> a lot of buckets/objects per distributor, so some more detail on
>>>> the above items would be great.
>>>>
>>>> Thanks a lot!
>>>>
>>>>
>>>> Daniel
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>
>>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com