On 5/23/14 03:47 , Georg Höllrigl wrote:
On 22.05.2014 17:30, Craig Lewis wrote:
On 5/22/14 06:16 , Georg Höllrigl wrote:
I have created one bucket that holds many small files, separated into
different "directories". But whenever I try to acess the bucket, I
only run into some timeout. The timeout is at around 30 - 100 seconds.
This is smaller then the Apache timeout of 300 seconds.
Just so we're all talking about the same things, what does "many small
files" mean to you? Also, how are you separating them into
"directories"? Are you just giving files in the same "directory" the
same leading string, like "dir1_subdir1_filename"?
I can only estimate how many files. ATM I've 25M files on the origin
but only 1/10th has been synced to radosgw. These are distributed
throuhg 20 folders, each containing about 2k directories with ~ 100 -
500 files each.
Do you think that's too much in that usecase?
The recommendations I've seen indicate that 25M objects per bucket is
doable, but painful. The bucket is itself an object stored in Ceph,
which stores the list of objects in that bucket. With a single bucket
containing 25M objects, you're going to hotspot on the bucket. Think of
a bucket like a directory on a filesystem. You wouldn't store 25M files
in a single directory.
Buckets are a bit simpler than directories. They don't have to track
permissions, per file ACLs, and all the other things that POSIX
filesystems do. You can push them harder than a normal directory, but
the same concepts still apply. The more files you put in a
bucket/directory, the slower it gets. Most filesystems impose a hard
limit on the number of files in a directory. RadosGW doesn't have a
limit, it just gets slower.
Even the list of buckets has this problem. You wouldn't want to create
25M buckets with one object each. By default, there is a 1000 bucket
limit per user, but you can increase that.
If you can handle using 20 buckets, it would be worthwhile to put each
one of your top 20 folders into it's own bucket. If you can break it
apart even more, that would be even better.
I mentioned that I have a bunch of buckets with ~1M objects each. GET
and PUT of objects is still fast, but listing the contents of the bucket
takes a long time. Each bucket takes 20-30 minutes to get a full
listing. If you're going to be doing a lot of bucket listing, you might
want to keep each bucket below 1000 items. Maybe each of your 2k
directories gets it's own bucket.
If using more than one bucket is difficult, then 25M objects in one
bucket will work.
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com