Wouldn't using only the first two characters in the file name result
in less then 65k buckets being used?

For example if the file names contained 0-9 and a-f, that would only
be 256 buckets (16*16).  Or if they contained 0-9, a-z, and A-Z, that
would only be 3,844 buckets (62 * 62).

Bryan


On Thu, Sep 5, 2013 at 8:19 AM, Bill Omer <[email protected]> wrote:
>
> Thats correct.  We created 65k buckets, using two hex characters as the 
> naming convention, then stored the files in each container based on their 
> first two characters in the file name.  The end result was 20-50 files per 
> bucket.  Once all of the buckets were created and files were being loaded, we 
> still observed an increase in latency overtime.
>
> Is there a way to disable indexing?  Or are there other settings you can 
> suggest to attempt to speed this process up?
>
>
> On Wed, Sep 4, 2013 at 5:21 PM, Mark Nelson <[email protected]> wrote:
>>
>> Just for clarification, distributing objects over lots of buckets isn't 
>> helping improve small object performance?
>>
>> The degradation over time is similar to something I've seen in the past, 
>> with higher numbers of seeks on the underlying OSD device over time.  Is it 
>> always (temporarily) resolved writing to a new empty bucket?
>>
>> Mark
>>
>>
>> On 09/04/2013 02:45 PM, Bill Omer wrote:
>>>
>>> We've actually done the same thing, creating 65k buckets and storing
>>> 20-50 objects in each.  No change really, not noticeable anyway
>>>
>>>
>>> On Wed, Sep 4, 2013 at 2:43 PM, Bryan Stillwell
>>> <[email protected] <mailto:[email protected]>> wrote:
>>>
>>>     So far I haven't seen much of a change.  It's still working through
>>>     removing the bucket that reached 1.5 million objects though (my
>>>     guess is that'll take a few more days), so I believe that might have
>>>     something to do with it.
>>>
>>>     Bryan
>>>
>>>
>>>     On Wed, Sep 4, 2013 at 12:14 PM, Mark Nelson
>>>     <[email protected] <mailto:[email protected]>> wrote:
>>>
>>>         Bryan,
>>>
>>>         Good explanation.  How's performance now that you've spread the
>>>         load over multiple buckets?
>>>
>>>         Mark
>>>
>>>         On 09/04/2013 12:39 PM, Bryan Stillwell wrote:
>>>
>>>             Bill,
>>>
>>>             I've run into a similar issue with objects averaging
>>>             ~100KiB.  The
>>>             explanation I received on IRC is that there are scaling
>>>             issues if you're
>>>             uploading them all to the same bucket because the index
>>>             isn't sharded.
>>>                The recommended solution is to spread the objects out to
>>>             a lot of
>>>             buckets.  However, that ran me into another issue once I hit
>>>             1000
>>>             buckets which is a per user limit.  I switched the limit to
>>>             be unlimited
>>>             with this command:
>>>
>>>             radosgw-admin user modify --uid=your_username --max-buckets=0
>>>
>>>             Bryan
>>>
>>>
>>>             On Wed, Sep 4, 2013 at 11:27 AM, Bill Omer
>>>             <[email protected] <mailto:[email protected]>
>>>             <mailto:[email protected] <mailto:[email protected]>>>
>>>
>>>             wrote:
>>>
>>>                  I'm testing ceph for storing a very large number of
>>>             small files.
>>>                    I'm seeing some performance issues and would like to
>>>             see if anyone
>>>                  could offer any insight as to what I could do to
>>>             correct this.
>>>
>>>                  Some numbers:
>>>
>>>                  Uploaded 184111 files, with an average file size of
>>>             5KB, using
>>>                  10 separate servers to upload the request using Python
>>>             and the
>>>                  cloudfiles module.  I stopped uploading after 53
>>>             minutes, which
>>>                  seems to average 5.7 files per second per node.
>>>
>>>
>>>                  My storage cluster consists of 21 OSD's across 7
>>>             servers, with their
>>>                  journals written to SSD drives.  I've done a default
>>>             installation,
>>>                  using ceph-deploy with the dumpling release.
>>>
>>>                  I'm using statsd to monitor the performance, and what's
>>>             interesting
>>>                  is when I start with an empty bucket, performance is
>>>             amazing, with
>>>                  average response times of 20-50ms.  However as time
>>>             goes on, the
>>>                  response times go in to the hundreds, and the average
>>>             number of
>>>                  uploads per second drops.
>>>
>>>                  I've installed radosgw on all 7 ceph servers.  I've
>>>             tested using a
>>>                  load balancer to distribute the api calls, as well as
>>>             pointing the
>>>                  10 worker servers to a single instance.  I've not seen
>>>             a real
>>>                  different in performance with this either.
>>>
>>>
>>>                  Each of the ceph servers are 16 core Xeon 2.53GHz with
>>>             72GB of ram,
>>>                  OCZ Vertex4 SSD drives for the journals and Seagate
>>>             Barracuda ES2
>>>                  drives for storage.
>>>
>>>
>>>                  Any help would be greatly appreciated.
>>>
>>>
>>>                  _________________________________________________
>>>
>>>                  ceph-users mailing list
>>>             [email protected] <mailto:[email protected]>
>>>             <mailto:[email protected].__com
>>>             <mailto:[email protected]>>
>>>             http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>>>
>>>             <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to