[Gluster-users] Design Questions

Moritz Krinke Tue, 07 Jul 2009 10:23:56 -0700

Hello,

while doing some research on how to build a System which enables us toeasily rsync backups from other system on it, i came across GlusterFS,and after several hours of reading, im quite impressed and im lookingforward to implement it ;)

Basicly i would like you to comment on the Design i put together, asthere are lots of different ways to do things, some might be preferedover others.


We need:

15 TB Storage, at least stored in a Raid1 like fashion, Raid5/Raid6would be preferable, but i think this is not possible with GlusterFS?!.While reading the docs i realised i could use this system probably toofor hosting our images via HTTP, because of features like

- easy to expand with new storage / servers
- io-cache
- Lighttpd Plugin for direct FS access

This way we would not just gain a backup storage for our pictures,which are currently served by a mogilefs/varnish/lighttpd cluster butalso a backup cluster which could serve files directly to our users.( community-site with lots of pictures, file size varies, but most thefiles are 50 to 300 KiloBytes, but planning on storing files with~10MB too)


Great :-)

We've planned to use the following Hardware

5 Servers, each with:
 - quad-core
 - 16 GB Ram
 - 4 x 1,5 TB HDD, no RAID
 - dedicated GBit Ethernet Switched Network

GlusterFS Setup:

 Same config on all nodes, each with

volume posix -> volume locks -> volume with io-threads -> volumewith write-behind -> volume with io-cache size of 14 GB (so 2GB isleft for the system)

 for each of the 4 drives / mountpoints

then having config entries for all 20 bricks, using tcp as transporttype

then creating cluster/replicate-volumes with always 2 disks ondifferent servers,and creating a cluster/nufa-volume having the 10 replicate-volumesas subvolumes.



 As i understand it, this should provide me with the following:

- data redundandy: if one disk fails, i can replace the disk andGlusterFS automaticlly replicates all the lost data back to the new disk

 AND/OR: the same if the whole server is lost/broken

- distributed access: a read access to a specific file will alwaysgo to the same server/drive, regardless from which server it getsrequested, and will therefore be cached by the io-cache layer onthe specific node which has the file on disk- ok, a littlenetwork-overhead, but thats better than putting the cache on top ofthe distribute-volume which would result in having the "same" cachedcontent on all servers- Global Cache with no dublicates of 70GB (5 servers times 14 GB io-cache ram per server)-> How exactly does the io-cache work? can i specify TTL for a file-pattern, or specify which files should not be cached at all... or..or? Cant find any specific info on this.- I can put apache/lighttpd on all the servers which then havedirect access to the storage, no need for extra webservers for servingstatic & cacheable content- Remote access: i can mount the fs from another Location (anotherDC), securely if i wish through some kind of VPN and use it there forbackup purposes- Expandable: i can just put 2 new servers online, each with 2/4/6/8drives

If you have read and understood ( :-) ) this, i would highlyappreciate if you could answer my questions and/or comment the inputyou might have.


Thanks a lot,
Moritz


_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

[Gluster-users] Design Questions

Reply via email to