Hello,

while doing some research on how to build a System which enables us to easily rsync backups from other system on it, i came across GlusterFS, and after several hours of reading, im quite impressed and im looking forward to implement it ;)

Basicly i would like you to comment on the Design i put together, as there are lots of different ways to do things, some might be prefered over others.

We need:
15 TB Storage, at least stored in a Raid1 like fashion, Raid5/Raid6 would be preferable, but i think this is not possible with GlusterFS?!. While reading the docs i realised i could use this system probably too for hosting our images via HTTP, because of features like
- easy to expand with new storage / servers
- io-cache
- Lighttpd Plugin for direct FS access

This way we would not just gain a backup storage for our pictures, which are currently served by a mogilefs/varnish/lighttpd cluster but also a backup cluster which could serve files directly to our users. ( community-site with lots of pictures, file size varies, but most the files are 50 to 300 KiloBytes, but planning on storing files with ~10MB too)

Great :-)

We've planned to use the following Hardware

5 Servers, each with:
 - quad-core
 - 16 GB Ram
 - 4 x 1,5 TB HDD, no RAID
 - dedicated GBit Ethernet Switched Network

GlusterFS Setup:

 Same config on all nodes, each with

volume posix -> volume locks -> volume with io-threads -> volume with write-behind -> volume with io-cache size of 14 GB (so 2GB is left for the system)
 for each of the 4 drives / mountpoints

then having config entries for all 20 bricks, using tcp as transport type

then creating cluster/replicate-volumes with always 2 disks on different servers, and creating a cluster/nufa-volume having the 10 replicate-volumes as subvolumes.


 As i understand it, this should provide me with the following:

- data redundandy: if one disk fails, i can replace the disk and GlusterFS automaticlly replicates all the lost data back to the new disk
 AND/OR: the same if the whole server is lost/broken
- distributed access: a read access to a specific file will always go to the same server/drive, regardless from which server it gets requested, and will therefore be cached by the io-cache layer on the specific node which has the file on disk- ok, a little network-overhead, but thats better than putting the cache on top of the distribute-volume which would result in having the "same" cached content on all servers - Global Cache with no dublicates of 70GB (5 servers times 14 GB io- cache ram per server) -> How exactly does the io-cache work? can i specify TTL for a file- pattern, or specify which files should not be cached at all... or.. or? Cant find any specific info on this. - I can put apache/lighttpd on all the servers which then have direct access to the storage, no need for extra webservers for serving static & cacheable content - Remote access: i can mount the fs from another Location (another DC), securely if i wish through some kind of VPN and use it there for backup purposes - Expandable: i can just put 2 new servers online, each with 2/4/6/8 drives

If you have read and understood ( :-) ) this, i would highly appreciate if you could answer my questions and/or comment the input you might have.

Thanks a lot,
Moritz


_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Reply via email to