Wouldn't using only the first two characters in the file name result in less then 65k buckets being used?
For example if the file names contained 0-9 and a-f, that would only be 256 buckets (16*16). Or if they contained 0-9, a-z, and A-Z, that would only be 3,844 buckets (62 * 62). Bryan On Thu, Sep 5, 2013 at 8:19 AM, Bill Omer <[email protected]> wrote: > > Thats correct. We created 65k buckets, using two hex characters as the > naming convention, then stored the files in each container based on their > first two characters in the file name. The end result was 20-50 files per > bucket. Once all of the buckets were created and files were being loaded, we > still observed an increase in latency overtime. > > Is there a way to disable indexing? Or are there other settings you can > suggest to attempt to speed this process up? > > > On Wed, Sep 4, 2013 at 5:21 PM, Mark Nelson <[email protected]> wrote: >> >> Just for clarification, distributing objects over lots of buckets isn't >> helping improve small object performance? >> >> The degradation over time is similar to something I've seen in the past, >> with higher numbers of seeks on the underlying OSD device over time. Is it >> always (temporarily) resolved writing to a new empty bucket? >> >> Mark >> >> >> On 09/04/2013 02:45 PM, Bill Omer wrote: >>> >>> We've actually done the same thing, creating 65k buckets and storing >>> 20-50 objects in each. No change really, not noticeable anyway >>> >>> >>> On Wed, Sep 4, 2013 at 2:43 PM, Bryan Stillwell >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> So far I haven't seen much of a change. It's still working through >>> removing the bucket that reached 1.5 million objects though (my >>> guess is that'll take a few more days), so I believe that might have >>> something to do with it. >>> >>> Bryan >>> >>> >>> On Wed, Sep 4, 2013 at 12:14 PM, Mark Nelson >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> Bryan, >>> >>> Good explanation. How's performance now that you've spread the >>> load over multiple buckets? >>> >>> Mark >>> >>> On 09/04/2013 12:39 PM, Bryan Stillwell wrote: >>> >>> Bill, >>> >>> I've run into a similar issue with objects averaging >>> ~100KiB. The >>> explanation I received on IRC is that there are scaling >>> issues if you're >>> uploading them all to the same bucket because the index >>> isn't sharded. >>> The recommended solution is to spread the objects out to >>> a lot of >>> buckets. However, that ran me into another issue once I hit >>> 1000 >>> buckets which is a per user limit. I switched the limit to >>> be unlimited >>> with this command: >>> >>> radosgw-admin user modify --uid=your_username --max-buckets=0 >>> >>> Bryan >>> >>> >>> On Wed, Sep 4, 2013 at 11:27 AM, Bill Omer >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>>> >>> >>> wrote: >>> >>> I'm testing ceph for storing a very large number of >>> small files. >>> I'm seeing some performance issues and would like to >>> see if anyone >>> could offer any insight as to what I could do to >>> correct this. >>> >>> Some numbers: >>> >>> Uploaded 184111 files, with an average file size of >>> 5KB, using >>> 10 separate servers to upload the request using Python >>> and the >>> cloudfiles module. I stopped uploading after 53 >>> minutes, which >>> seems to average 5.7 files per second per node. >>> >>> >>> My storage cluster consists of 21 OSD's across 7 >>> servers, with their >>> journals written to SSD drives. I've done a default >>> installation, >>> using ceph-deploy with the dumpling release. >>> >>> I'm using statsd to monitor the performance, and what's >>> interesting >>> is when I start with an empty bucket, performance is >>> amazing, with >>> average response times of 20-50ms. However as time >>> goes on, the >>> response times go in to the hundreds, and the average >>> number of >>> uploads per second drops. >>> >>> I've installed radosgw on all 7 ceph servers. I've >>> tested using a >>> load balancer to distribute the api calls, as well as >>> pointing the >>> 10 worker servers to a single instance. I've not seen >>> a real >>> different in performance with this either. >>> >>> >>> Each of the ceph servers are 16 core Xeon 2.53GHz with >>> 72GB of ram, >>> OCZ Vertex4 SSD drives for the journals and Seagate >>> Barracuda ES2 >>> drives for storage. >>> >>> >>> Any help would be greatly appreciated. >>> >>> >>> _________________________________________________ >>> >>> ceph-users mailing list >>> [email protected] <mailto:[email protected]> >>> <mailto:[email protected].__com >>> <mailto:[email protected]>> >>> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com >>> >>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
