On 23 October 2011 18:27, Yedidyah Bar-David <[email protected]> wrote: > I used drbd on a LAN, and know that it can theoretically work rather well > on larger distance when used as read-write on one side only. They also > have a pay-for tool to do this asyncronously called drbd proxy. This > implies using a local copy and have drbd sync it. You can choose between > three what they call "Protocols" to affect the perceived local latency.
DRBD is indeed a very good tool for block device replication once you learn your ways around it. I'm saying this from extensive personal experience (we used to use it heavily before we moved to "real" hardware SAN servers (EMC and now HDS)). DRBD does not care about the file system. It'll support multi-node read/write if you tell it to or force strict "one node can write" if you tell it to. If you want to use it with read/write on more than one side then you MUST use a Cluster-Aware Filesyste, (e.g. RedHat GFS or Oracle OCFS2, note that these are different from "Distributed Filesystems" below in that they do not replicate data (e.g. the data could be on a shared disk like a SAN, which is what DRBD can be viewed as)). We got GFS up for a test but performance on top of DRBD sucked. I heard that people do use them so maybe in a different configuration they are usable. These are filesystems which read/write to a regular device for all they care but they are AWARE that someone else might be manipulating the same disk blocks at the same time and need extra mechanisms to coordinate the changes. About Didi's "protocol" remark - these are not really different protocols (even through they call them "protcol a/b/c") but actually a way for you to decide when would a writer consider the block to be replicated - whether being ack'ed that it was received by the remote node is enough, or when the remote node put it on its disk write queue or when the remote's disk has ack'ed that it's physically written to the platter (you can skip that one if the disk has Battery-Backup-Up (BBU) write cache). The main limitation of DRBD is that it allows for only two nodes to sync between themselves (each side of the sync can be handled by a cluster of servers for HA, but still only one logical host at each end). The commercial add-on allows for a third node to listen to the traffic, potentially over WAN to a remote site, and replace one of the sides if it goes down. This is meant to be used for DC (Disaster Recovery) sites and off-site backup. DRBD also slows down the writes to local disk. Our measurements put this cost at ~10% so it's usually not an issue. This is another factor in deciding between "protocol" a/b/c. There are a few "distributed filesystems" floating around (note these are different from "Cluster Aware FS", they replicate the data at the FS level). Here is a list from Wikipedia: http://en.wikipedia.org/wiki/Distributed_file_system (BTW, as far as I know, HDFS shouldn't be considered a general-purpose file system, even it would have been cool to use it for that :). Ceph support is part of the vanilla Linux kernel. AFS (http://en.wikipedia.org/wiki/Andrew_File_System) is the one I used to hear about a lot at the time (long time ago), I think it's successor is CODA, but I've never got a good enough excuse to try it. It'll be interesting to hear what you ended up using and how. Cheers, --Amos _______________________________________________ Linux-il mailing list [email protected] http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
