[EMAIL PROTECTED] writes:
| [EMAIL PROTECTED] writes:
|
| > I dont think that quorum is the main issue about wide-area access, but
| > that is neither here nor there.
|
| Unfortunately, a significant number of your customers do. Since you
| participated in a lengthy discussion of this issue about a year ago,
| I'm surprised and alarmed that you don't seem to be aware of this.
To show that there are more significant issues than quorum, I'll examine
another filesystem, NFS, albeit very briefly. It has no quorum
problems. Well, why doesn't one use it over the WAN? A few major
reasons come to mind:
1. poor protocol between client and server, resulting in very chatty
interactions between the machines
2. poor rpc, which cannot cope with network congestion created due to
1 above, and is difficult to administer through firewalls
3. no security
4. no way to duplicate data across sections of the network, for data
that is requested often
These are issues that AFS has solved pretty well, which is why lots of
organizations do share data over the internet using AFS.
What I guess you are asking really is _one_ cell across a WAN, spread
over low-speed, frequently unavailable network connections. Well, that
*is* a pretty hard problem, and is not easy to solve.
Morgan-Stanley has taken an interesting approach to working around this
limitation, where the administrators know that it not a single cell, but
the view of the filesystem given to users is of a single login, and one
system; it is worthwhile to take a look at their method.
The quorum issue is merely an artifact of an approach that Transarc/CMU
took to solving the meta-data issue. NFS tries to solve this using
NIS/NIS+ combined with lazy replication and automounting, and that has
its own set of bizarre side-effects. Which one is better? I say the AFS
approach is, since there are usually no side-effects present in presence
of server failures (server failures are usually 95% of system failures).
|
| > The point is anything weaker than
| > transactional semantics when using replication for such metadata is
| > known to be a root cause of `bad' behaviour in almost all distributed
| > systems. If you weaken it (for example, by giving out info outside of
| > quorum), you will have much more hairy problems that just WAN access.
|
| This is all very interesting, but the question is whether it Transarc
| has addressed the problem its customers wanted it to address. I would
| simply like to know one way or the other.
Well, if the problem is technically hard to solve, it doesn't become any
easier just because it is on everybody's wish list. Look at how much of
effort has been put by the database vendors like Sybase and Oracle into
replication, with not much practical success. And yet it is the
number-one feature wanted of any RDBMS today.
We at Transarc try to do our best in solving some of the really hard
technical problems; sometimes we have limited success because the
problems are much tougher than we imagined. We too would like to solve
the single-cell-across-sixty-cities problem.
| For reference, here is a description of the problem as I see it:
|
| At AFS 3.3, when an AFS server (assume it covers all services) gets
| cut off from the rest of the AFS servers in its cell, users at
| workstations which have access to this cut-off server may be unable
| to access files in volumes which exist on it.
I don't think you are accurate here. I remember a reboot of all those
workstations after that fileserver was cut-off.