Sorry, I meant to say the first four characters, for a total of 65539 buckets
On Thu, Sep 5, 2013 at 12:30 PM, Bryan Stillwell <[email protected] > wrote: > Wouldn't using only the first two characters in the file name result > in less then 65k buckets being used? > > For example if the file names contained 0-9 and a-f, that would only > be 256 buckets (16*16). Or if they contained 0-9, a-z, and A-Z, that > would only be 3,844 buckets (62 * 62). > > Bryan > > > On Thu, Sep 5, 2013 at 8:19 AM, Bill Omer <[email protected]> wrote: > > > > Thats correct. We created 65k buckets, using two hex characters as the > naming convention, then stored the files in each container based on their > first two characters in the file name. The end result was 20-50 files per > bucket. Once all of the buckets were created and files were being loaded, > we still observed an increase in latency overtime. > > > > Is there a way to disable indexing? Or are there other settings you can > suggest to attempt to speed this process up? > > > > > > On Wed, Sep 4, 2013 at 5:21 PM, Mark Nelson <[email protected]> > wrote: > >> > >> Just for clarification, distributing objects over lots of buckets isn't > helping improve small object performance? > >> > >> The degradation over time is similar to something I've seen in the > past, with higher numbers of seeks on the underlying OSD device over time. > Is it always (temporarily) resolved writing to a new empty bucket? > >> > >> Mark > >> > >> > >> On 09/04/2013 02:45 PM, Bill Omer wrote: > >>> > >>> We've actually done the same thing, creating 65k buckets and storing > >>> 20-50 objects in each. No change really, not noticeable anyway > >>> > >>> > >>> On Wed, Sep 4, 2013 at 2:43 PM, Bryan Stillwell > >>> <[email protected] <mailto:[email protected]>> > wrote: > >>> > >>> So far I haven't seen much of a change. It's still working through > >>> removing the bucket that reached 1.5 million objects though (my > >>> guess is that'll take a few more days), so I believe that might > have > >>> something to do with it. > >>> > >>> Bryan > >>> > >>> > >>> On Wed, Sep 4, 2013 at 12:14 PM, Mark Nelson > >>> <[email protected] <mailto:[email protected]>> wrote: > >>> > >>> Bryan, > >>> > >>> Good explanation. How's performance now that you've spread the > >>> load over multiple buckets? > >>> > >>> Mark > >>> > >>> On 09/04/2013 12:39 PM, Bryan Stillwell wrote: > >>> > >>> Bill, > >>> > >>> I've run into a similar issue with objects averaging > >>> ~100KiB. The > >>> explanation I received on IRC is that there are scaling > >>> issues if you're > >>> uploading them all to the same bucket because the index > >>> isn't sharded. > >>> The recommended solution is to spread the objects out to > >>> a lot of > >>> buckets. However, that ran me into another issue once I > hit > >>> 1000 > >>> buckets which is a per user limit. I switched the limit to > >>> be unlimited > >>> with this command: > >>> > >>> radosgw-admin user modify --uid=your_username > --max-buckets=0 > >>> > >>> Bryan > >>> > >>> > >>> On Wed, Sep 4, 2013 at 11:27 AM, Bill Omer > >>> <[email protected] <mailto:[email protected]> > >>> <mailto:[email protected] <mailto:[email protected]>>> > >>> > >>> wrote: > >>> > >>> I'm testing ceph for storing a very large number of > >>> small files. > >>> I'm seeing some performance issues and would like to > >>> see if anyone > >>> could offer any insight as to what I could do to > >>> correct this. > >>> > >>> Some numbers: > >>> > >>> Uploaded 184111 files, with an average file size of > >>> 5KB, using > >>> 10 separate servers to upload the request using Python > >>> and the > >>> cloudfiles module. I stopped uploading after 53 > >>> minutes, which > >>> seems to average 5.7 files per second per node. > >>> > >>> > >>> My storage cluster consists of 21 OSD's across 7 > >>> servers, with their > >>> journals written to SSD drives. I've done a default > >>> installation, > >>> using ceph-deploy with the dumpling release. > >>> > >>> I'm using statsd to monitor the performance, and > what's > >>> interesting > >>> is when I start with an empty bucket, performance is > >>> amazing, with > >>> average response times of 20-50ms. However as time > >>> goes on, the > >>> response times go in to the hundreds, and the average > >>> number of > >>> uploads per second drops. > >>> > >>> I've installed radosgw on all 7 ceph servers. I've > >>> tested using a > >>> load balancer to distribute the api calls, as well as > >>> pointing the > >>> 10 worker servers to a single instance. I've not seen > >>> a real > >>> different in performance with this either. > >>> > >>> > >>> Each of the ceph servers are 16 core Xeon 2.53GHz with > >>> 72GB of ram, > >>> OCZ Vertex4 SSD drives for the journals and Seagate > >>> Barracuda ES2 > >>> drives for storage. > >>> > >>> > >>> Any help would be greatly appreciated. > >>> > >>> > >>> _________________________________________________ > >>> > >>> ceph-users mailing list > >>> [email protected] <mailto: > [email protected]> > >>> <mailto:[email protected].__com > >>> <mailto:[email protected]>> > >>> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com > >>> > >>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
