On 21 September 2016 at 02:57, Haomai Wang <hao...@xsky.com> wrote:
> On Wed, Sep 21, 2016 at 2:41 AM, Wido den Hollander <w...@42on.com> wrote:
>>> Op 20 september 2016 om 20:30 schreef Haomai Wang <hao...@xsky.com>:
>>> On Wed, Sep 21, 2016 at 2:26 AM, Wido den Hollander <w...@42on.com> wrote:
>>> >
>>> >> Op 20 september 2016 om 19:27 schreef Gregory Farnum 
>>> >> <gfar...@redhat.com>:
>>> >>
>>> >>
>>> >> In librados getting a stat is basically equivalent to reading a small
>>> >> object; there's not an index or anything so FileStore needs to descend 
>>> >> its
>>> >> folder hierarchy. If looking at metadata for all the objects in the 
>>> >> system
>>> >> efficiently is important you'll want to layer an index in somewhere.
>>> >> -Greg
>>> >>
>>> >
>>> > Should we expect a improvement here with BlueStore vs FileStore? That 
>>> > would basically be a RocksDB lookup on the OSD, right?
>>> Yes, bluestore will be much better since it has indexed on Onode(like
>>> inode) in rocksdb. Although it's fast enough, it also cost some on
>>> construct object, if you only want to check object existence, we may
>>> need a more lightweight interface
>> It's rados_stat() which would be called, that is the way to check if a 
>> object exists. If I remember the BlueStore architecture correctly it would 
>> be a lookup in RocksDB with all the information in there.
> Exactly, but compared to database query, this lookup is still heavy.
> Each onode construct need to get lots of keys and do inline construct.
> Of course, it's a cheaper one in all rados interfaces.

>From some preliminary tests, I've noted that BlueStore is exceedingly
quicker doing millions of random small file IO compared to FileStore.
But this is only with around 1/25th of the data we are holding.

So having an index pool is the only way to get faster lookup speeds?
I don't think having one is really for my use case, with billions of
objects being held, I don't think maintaining such an index would be
any quicker than what rados_stat() is capable of achieving already.

In any case, these clients maintain and validate the data that's
stored, it would inherently assume that any index is wrong.

Iain Buclaw
ceph-users mailing list

Reply via email to