On Mon, Nov 21, 2011 at 10:55 AM, Gregory Farnum
<[email protected]> wrote:
> On Sun, Nov 20, 2011 at 3:03 PM, Yehuda Sadeh Weinraub
> <[email protected]> wrote:
>> On Sun, Nov 20, 2011 at 9:49 AM, Leander Yu <[email protected]> wrote:
>>> Hi all,
>>> I found that after 0.37 the radosgw have some fundamental changes
>>> which put all object in to .rgw.buckets pool.
>>> From the release note it seem for better scalability however I wonder
>>> how this changes will improve the scalability? Base on our test, a
>>> simple list bucket command by s3cmd will take more than 10 sec when
>>> the file number is bigger than 10k. is this normal? or it's a
>>> potential bug?
>>
>> The scaling issue that was solved was the ability to increase the
>> number of buckets, whereas you're hitting a different issue now that
>> relates to the number of objects per bucket. The problem is with the
>> inefficient implementation of the rados tmap (trivial map) that
>> requires that every read/write from the directory index requires
>> reading the entire object, which is not too scalable. We are going to
>> replace tmap with a not-so-trivial-map that would scale much better
>> (feature #1571 in the ceph tracker, currently planned for 0.39).
>>
>> I verified that this is in fact the issue. The problem with listing
>> object using s3cmd is that it requests the data in chunks of 1000,
>> which means that going through 10k objects requires that the entire
>> directory is being read of disk (on the osd side) 10 times.
>
> I wouldn't expect this to be so slow, though — presumably the
> directory object is in cache so all it's doing is some memory copies
> after the first read off disk?

Yeah, you're right. I redid it again, this time looking carefully at
who waits where, and it appears that most of the time is spent in
s3cmd itself when it digests the information.

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to