Re: [ceph-users] xattrs vs omap

Gregory Farnum Tue, 14 Jul 2015 03:19:04 -0700

On Tue, Jul 14, 2015 at 10:53 AM, Jan Schermer <[email protected]> wrote:
> Thank you for your reply.
> Comments inline.
>
> I’m still hoping to get some more input, but there are many people running 
> ceph on ext4, and it sounds like it works pretty good out of the box. Maybe 
> I’m overthinking this, then?


I think so — somebody did a lot of work making sure we were well-tuned
on the standard filesystems; I believe it was David.
-Greg

>
> Jan
>
>> On 13 Jul 2015, at 21:04, Somnath Roy <[email protected]> wrote:
>>
>> <<inline
>>
>> -----Original Message-----
>> From: ceph-users [mailto:[email protected]] On Behalf Of Jan 
>> Schermer
>> Sent: Monday, July 13, 2015 2:32 AM
>> To: [email protected]
>> Subject: Re: [ceph-users] xattrs vs omap
>>
>> Sorry for reviving an old thread, but could I get some input on this, pretty 
>> please?
>>
>> ext4 has 256-byte inodes by default (at least according to docs) but the 
>> fragment below says:
>> OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)
>>
>> The default 512b is too much if the inode is just 256b, so shouldn’t that be 
>> 256b in case people use the default ext4 inode size?
>>
>> Anyway, is it better to format ext4 with larger inodes (say 2048b) and set 
>> filestore_max_inline_xattr_size_other=1536, or leave it at defaults?
>> [Somnath] Why 1536 ? why not 1024 or any power of 2 ? I am not seeing any 
>> harm though, but, curious.
>
> AFAIK there is other information in the inode other than xattrs, also you 
> need to count the xattra labels into this - so if I want to store 1536B of 
> “values” it would cost more, and there still needs to be some space left.
>
>> (As I understand it, on ext4 xattrs ale limited to one block, inode size + 
>> something can spill to one different inode - maybe someone knows better).
>>
>>
>> [Somnath] The xttr size ("_") is now more than 256 bytes and it will spill 
>> over, so, bigger inode  size will be good. But, I would suggest do your 
>> benchmark before putting it into production.
>>
>
> Good poin and I am going to do that, but I’d like to avoid the guesswork. 
> Also, not all patterns are always replicable….
>
>> Is filestore_max_inline_xattr_size and absolute limit, or is it 
>> filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality?
>>
>> [Somnath] The *_size is tracking the xttr size per attribute and 
>> *inline_xattrs keep track of max number of inline attributes allowed. So, if 
>> a xattr size is > *_size , it will go to omap and also if the total number 
>> of xattra > *inline_xattrs , it will go to omap.
>> If you are only using rbd, the number of inline xattrs will be always 2 and 
>> it will not cross that default max limit.
>
> If I’m reading this correctly then with my setting of  
> filestore_max_inline_xattr_size_other=1536, it could actually consume 3072B 
> (2 xattrs), so I should in reality use 4K inodes…?
>
>
>>
>> Does OSD do the sane thing if for some reason the xattrs do not fit? What 
>> are the performance implications of storing the xattrs in leveldb?
>>
>> [Somnath] Even though I don't have the exact numbers, but, it has a 
>> significant overhead if the xattrs go to leveldb.
>>
>> And lastly - what size of xattrs should I really expect if all I use is RBD 
>> for OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and 
>> pool snapshots). This overhead is quite large
>>
>> [Somnath] It will be 2 xattrs, default "_" will be little bigger than 256 
>> bytes and "_snapset" is small depends on number of snaps/clones, but 
>> unlikely will cross 256 bytes range.
>
> I have few pool snapshots and lots (hundreds) of (nested) snapshots for rbd 
> volumes. Does this come into play somehow?
>
>>
>> My plan so far is to format the drives like this:
>> mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256 (2048b 
>> inode, 4096b block size, one inode for 512k of space and set  
>> filestore_max_inline_xattr_size_other=1536
>> [Somnath] Not much idea on ext4, sorry..
>>
>> Does that make sense?
>>
>> Thanks!
>>
>> Jan
>>
>>
>>
>>> On 02 Jul 2015, at 12:18, Jan Schermer <[email protected]> wrote:
>>>
>>> Does anyone have a known-good set of parameters for ext4? I want to try it 
>>> as well but I’m a bit worried what happnes if I get it wrong.
>>>
>>> Thanks
>>>
>>> Jan
>>>
>>>
>>>
>>>> On 02 Jul 2015, at 09:40, Nick Fisk <[email protected]> wrote:
>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-users [mailto:[email protected]] On
>>>>> Behalf Of Christian Balzer
>>>>> Sent: 02 July 2015 02:23
>>>>> To: Ceph Users
>>>>> Subject: Re: [ceph-users] xattrs vs omap
>>>>>
>>>>> On Thu, 2 Jul 2015 00:36:18 +0000 Somnath Roy wrote:
>>>>>
>>>>>> It is replaced with the following config option..
>>>>>>
>>>>>> // Use omap for xattrs for attrs over //
>>>>>> filestore_max_inline_xattr_size or
>>>>>> OPTION(filestore_max_inline_xattr_size, OPT_U32, 0)     //Override
>>>>>> OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536)
>>>>>> OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048)
>>>>>> OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)
>>>>>>
>>>>>> // for more than filestore_max_inline_xattrs attrs
>>>>>> OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override
>>>>>> OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10)
>>>>>> OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10)
>>>>>> OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2)
>>>>>>
>>>>>>
>>>>>> If these limits crossed, xattrs will be stored in omap..
>>>>>>
>>>>> Sounds fair.
>>>>>
>>>>> Since I only use RBD I don't think it will ever exceed this.
>>>>
>>>> Possibly, see my thread  about performance difference between new and
>>>> old pools. Still not quite sure what's going on, but for some reasons
>>>> some of the objects behind RBD's have larger xattrs which is causing
>>>> really poor performance.
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Chibi
>>>>>> For ext4, you can use either filestore_max*_other or
>>>>>> filestore_max_inline_xattrs/ filestore_max_inline_xattr_size. I any
>>>>>> case, later two will override everything.
>>>>>>
>>>>>> Thanks & Regards
>>>>>> Somnath
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Christian Balzer [mailto:[email protected]]
>>>>>> Sent: Wednesday, July 01, 2015 5:26 PM
>>>>>> To: Ceph Users
>>>>>> Cc: Somnath Roy
>>>>>> Subject: Re: [ceph-users] xattrs vs omap
>>>>>>
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> On Wed, 1 Jul 2015 15:24:13 +0000 Somnath Roy wrote:
>>>>>>
>>>>>>> It doesn't matter, I think filestore_xattr_use_omap is a 'noop'
>>>>>>> and not used in the Hammer.
>>>>>>>
>>>>>> Then what was this functionality replaced with, esp. considering
>>>>>> EXT4 based OSDs?
>>>>>>
>>>>>> Chibi
>>>>>>> Thanks & Regards
>>>>>>> Somnath
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: ceph-users [mailto:[email protected]] On
>>>>>>> Behalf Of Adam Tygart Sent: Wednesday, July 01, 2015 8:20 AM
>>>>>>> To: Ceph Users
>>>>>>> Subject: [ceph-users] xattrs vs omap
>>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> I've got a coworker who put "filestore_xattr_use_omap = true" in
>>>>>>> the ceph.conf when we first started building the cluster. Now he
>>>>>>> can't remember why. He thinks it may be a holdover from our first
>>>>>>> Ceph cluster (running dumpling on ext4, iirc).
>>>>>>>
>>>>>>> In the newly built cluster, we are using XFS with 2048 byte
>>>>>>> inodes, running Ceph 0.94.2. It currently has production data in it.
>>>>>>>
>>>>>>> From my reading of other threads, it looks like this is probably
>>>>>>> not something you want set to true (at least on XFS), due to
>>>>>>> performance implications. Is this something you can change on a running 
>>>>>>> cluster?
>>>>>>> Is it worth the hassle?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Adam
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>>> ________________________________
>>>>>>>
>>>>>>> PLEASE NOTE: The information contained in this electronic mail
>>>>>>> message is intended only for the use of the designated
>>>>>>> recipient(s) named above. If the reader of this message is not the
>>>>>>> intended recipient, you are hereby notified that you have received
>>>>>>> this message in error and that any review, dissemination,
>>>>>>> distribution, or copying of this message is strictly prohibited.
>>>>>>> If you have received this communication in error, please notify
>>>>>>> the sender by telephone or e-mail (as shown above) immediately and
>>>>>>> destroy any and all copies of this message in your possession
>>>>>>> (whether hard copies or electronically stored copies).
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Christian Balzer        Network/Systems Engineer
>>>>>> [email protected]           Global OnLine Japan/Fusion Communications
>>>>>> http://www.gol.com/
>>>>>>
>>>>>> ________________________________
>>>>>>
>>>>>> PLEASE NOTE: The information contained in this electronic mail
>>>>>> message is intended only for the use of the designated recipient(s) 
>>>>>> named above.
>>>>>> If the reader of this message is not the intended recipient, you
>>>>>> are hereby notified that you have received this message in error
>>>>>> and that any review, dissemination, distribution, or copying of
>>>>>> this message is strictly prohibited. If you have received this
>>>>>> communication in error, please notify the sender by telephone or
>>>>>> e-mail (as shown above) immediately and destroy any and all copies
>>>>>> of this message in your possession (whether hard copies or 
>>>>>> electronically stored copies).
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Christian Balzer        Network/Systems Engineer
>>>>> [email protected]       Global OnLine Japan/Fusion Communications
>>>>> http://www.gol.com/
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> [email protected]
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is 
>> intended only for the use of the designated recipient(s) named above. If the 
>> reader of this message is not the intended recipient, you are hereby 
>> notified that you have received this message in error and that any review, 
>> dissemination, distribution, or copying of this message is strictly 
>> prohibited. If you have received this communication in error, please notify 
>> the sender by telephone or e-mail (as shown above) immediately and destroy 
>> any and all copies of this message in your possession (whether hard copies 
>> or electronically stored copies).
>>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] xattrs vs omap

Reply via email to