After back-porting Sage's patch to Giant, with radosgw, the xattrs can get 
inline. I haven't run extensive testing yet, will update once I have some 
performance data to share.

Thanks,
Guang

> Date: Tue, 16 Jun 2015 15:51:44 -0500
> From: mnel...@redhat.com
> To: yguan...@outlook.com; s...@newdream.net
> CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
> Subject: Re: xattrs vs. omap with radosgw
> 
> 
> 
> On 06/16/2015 03:48 PM, GuangYang wrote:
> > Thanks Sage for the quick response.
> >
> > It is on Firefly v0.80.4.
> >
> > While trying to put with *rados* directly, the xattrs can be inline. The 
> > problem comes to light when using radosgw, since we have a bunch of 
> > metadata to keep via xattrs, including:
> >     rgw.idtag  : 15 bytes
> >     rgw.manifest :  381 bytes
> 
> Ah, that manifest will push us over the limit afaik resulting in every 
> inode getting a new extent.
> 
> >     rgw.acl : 121 bytes
> >     rgw.etag : 33 bytes
> >
> > Given the background, it looks like the problem is that the rgw.manifest is 
> > too large so that XFS make it extents. If I understand correctly, if we 
> > port the change to Firefly, we should be able to inline the inode since the 
> > accumulated size is still less than 2K (please correct me if I am wrong 
> > here).
> 
> I think you are correct so long as the patch breaks that manifest down 
> into 254 byte or smaller chunks.
> 
> >
> > Thanks,
> > Guang
> >
> >
> > ----------------------------------------
> >> Date: Tue, 16 Jun 2015 12:43:08 -0700
> >> From: s...@newdream.net
> >> To: yguan...@outlook.com
> >> CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
> >> Subject: Re: xattrs vs. omap with radosgw
> >>
> >> On Tue, 16 Jun 2015, GuangYang wrote:
> >>> Hi Cephers,
> >>> While looking at disk utilization on OSD, I noticed the disk was 
> >>> constantly busy with large number of small writes, further investigation 
> >>> showed that, as radosgw uses xattrs to store metadata (e.g. etag, 
> >>> content-type, etc.), which made the xattrs get from local to extents, 
> >>> which incurred extra I/O.
> >>>
> >>> I would like to check if anybody has experience with offloading the 
> >>> metadata to omap:
> >>> 1> Offload everything to omap? If this is the case, should we make the 
> >>> inode size as 512 (instead of 2k)?
> >>> 2> Partial offload the metadata to omap, e.g. only offloading the rgw 
> >>> specified metadata to omap.
> >>>
> >>> Any sharing is deeply appreciated. Thanks!
> >>
> >> Hi Guang,
> >>
> >> Is this hammer or firefly?
> >>
> >> With hammer the size of object_info_t crossed the 255 byte boundary, which
> >> is the max xattr value that XFS can inline. We've since merged something
> >> that stripes over several small xattrs so that we can keep things inline,
> >> but it hasn't been backported to hammer yet. See
> >> c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're
> >> seeing?
> >>
> >> I think we're still better off with larger XFS inodes and inline xattrs if
> >> it means we avoid leveldb at all for most objects.
> >>
> >> sage
> >                                       --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
                                          
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to