Re: [ceph-users] Performance issues with small files

Bill Omer Thu, 05 Sep 2013 10:28:31 -0700

Sorry, I meant to say the first four characters, for a total of 65539
buckets



On Thu, Sep 5, 2013 at 12:30 PM, Bryan Stillwell <[email protected]
> wrote:

> Wouldn't using only the first two characters in the file name result
> in less then 65k buckets being used?
>
> For example if the file names contained 0-9 and a-f, that would only
> be 256 buckets (16*16).  Or if they contained 0-9, a-z, and A-Z, that
> would only be 3,844 buckets (62 * 62).
>
> Bryan
>
>
> On Thu, Sep 5, 2013 at 8:19 AM, Bill Omer <[email protected]> wrote:
> >
> > Thats correct.  We created 65k buckets, using two hex characters as the
> naming convention, then stored the files in each container based on their
> first two characters in the file name.  The end result was 20-50 files per
> bucket.  Once all of the buckets were created and files were being loaded,
> we still observed an increase in latency overtime.
> >
> > Is there a way to disable indexing?  Or are there other settings you can
> suggest to attempt to speed this process up?
> >
> >
> > On Wed, Sep 4, 2013 at 5:21 PM, Mark Nelson <[email protected]>
> wrote:
> >>
> >> Just for clarification, distributing objects over lots of buckets isn't
> helping improve small object performance?
> >>
> >> The degradation over time is similar to something I've seen in the
> past, with higher numbers of seeks on the underlying OSD device over time.
>  Is it always (temporarily) resolved writing to a new empty bucket?
> >>
> >> Mark
> >>
> >>
> >> On 09/04/2013 02:45 PM, Bill Omer wrote:
> >>>
> >>> We've actually done the same thing, creating 65k buckets and storing
> >>> 20-50 objects in each.  No change really, not noticeable anyway
> >>>
> >>>
> >>> On Wed, Sep 4, 2013 at 2:43 PM, Bryan Stillwell
> >>> <[email protected] <mailto:[email protected]>>
> wrote:
> >>>
> >>>     So far I haven't seen much of a change.  It's still working through
> >>>     removing the bucket that reached 1.5 million objects though (my
> >>>     guess is that'll take a few more days), so I believe that might
> have
> >>>     something to do with it.
> >>>
> >>>     Bryan
> >>>
> >>>
> >>>     On Wed, Sep 4, 2013 at 12:14 PM, Mark Nelson
> >>>     <[email protected] <mailto:[email protected]>> wrote:
> >>>
> >>>         Bryan,
> >>>
> >>>         Good explanation.  How's performance now that you've spread the
> >>>         load over multiple buckets?
> >>>
> >>>         Mark
> >>>
> >>>         On 09/04/2013 12:39 PM, Bryan Stillwell wrote:
> >>>
> >>>             Bill,
> >>>
> >>>             I've run into a similar issue with objects averaging
> >>>             ~100KiB.  The
> >>>             explanation I received on IRC is that there are scaling
> >>>             issues if you're
> >>>             uploading them all to the same bucket because the index
> >>>             isn't sharded.
> >>>                The recommended solution is to spread the objects out to
> >>>             a lot of
> >>>             buckets.  However, that ran me into another issue once I
> hit
> >>>             1000
> >>>             buckets which is a per user limit.  I switched the limit to
> >>>             be unlimited
> >>>             with this command:
> >>>
> >>>             radosgw-admin user modify --uid=your_username
> --max-buckets=0
> >>>
> >>>             Bryan
> >>>
> >>>
> >>>             On Wed, Sep 4, 2013 at 11:27 AM, Bill Omer
> >>>             <[email protected] <mailto:[email protected]>
> >>>             <mailto:[email protected] <mailto:[email protected]>>>
> >>>
> >>>             wrote:
> >>>
> >>>                  I'm testing ceph for storing a very large number of
> >>>             small files.
> >>>                    I'm seeing some performance issues and would like to
> >>>             see if anyone
> >>>                  could offer any insight as to what I could do to
> >>>             correct this.
> >>>
> >>>                  Some numbers:
> >>>
> >>>                  Uploaded 184111 files, with an average file size of
> >>>             5KB, using
> >>>                  10 separate servers to upload the request using Python
> >>>             and the
> >>>                  cloudfiles module.  I stopped uploading after 53
> >>>             minutes, which
> >>>                  seems to average 5.7 files per second per node.
> >>>
> >>>
> >>>                  My storage cluster consists of 21 OSD's across 7
> >>>             servers, with their
> >>>                  journals written to SSD drives.  I've done a default
> >>>             installation,
> >>>                  using ceph-deploy with the dumpling release.
> >>>
> >>>                  I'm using statsd to monitor the performance, and
> what's
> >>>             interesting
> >>>                  is when I start with an empty bucket, performance is
> >>>             amazing, with
> >>>                  average response times of 20-50ms.  However as time
> >>>             goes on, the
> >>>                  response times go in to the hundreds, and the average
> >>>             number of
> >>>                  uploads per second drops.
> >>>
> >>>                  I've installed radosgw on all 7 ceph servers.  I've
> >>>             tested using a
> >>>                  load balancer to distribute the api calls, as well as
> >>>             pointing the
> >>>                  10 worker servers to a single instance.  I've not seen
> >>>             a real
> >>>                  different in performance with this either.
> >>>
> >>>
> >>>                  Each of the ceph servers are 16 core Xeon 2.53GHz with
> >>>             72GB of ram,
> >>>                  OCZ Vertex4 SSD drives for the journals and Seagate
> >>>             Barracuda ES2
> >>>                  drives for storage.
> >>>
> >>>
> >>>                  Any help would be greatly appreciated.
> >>>
> >>>
> >>>                  _________________________________________________
> >>>
> >>>                  ceph-users mailing list
> >>>             [email protected] <mailto:
> [email protected]>
> >>>             <mailto:[email protected].__com
> >>>             <mailto:[email protected]>>
> >>>             http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
> >>>
> >>>             <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance issues with small files

Reply via email to