We are running since years a cell which is distributed over WAN. The
main cell is in Garching near Munich, but parts are in Berlin (550 km)
and Greifswald (750 km). The network in between is today not too bad,
but outages happen still once in a while.

We found it therefor necessary to have database servers locally at the
remote sites. So we have in total 5 database servers: 3 in Garching, one
in Berlin and one in Greifswald. In order to have always a sync site,
even when the network is down and one database server in Garching fails
we changed the ubik protocol introducing a clone-type database server. A
clone never can become sync site and his voting is not counted. This
allows to get a sync site in Garching with only to database servers
running and it breaks the rule of the lowest IP address (otherwise you
could end up with your sync site at a remote place if the IP addresses
are that way).

I submitted these changes to ubik 3 years ago to Transarc, but obviously
they didn't find them worth to be incorporated. 

Concerning the volumes you must very carefully check for all
dependencies in your software. We replicate all software and the base
structure of our AFS-tree to the remote sites, and have the
home-directories at the site where the user lives.

Depending on the network class you have AFS may be unable to recognize
which subnets are physically near and which not. Therefor you sometimes
need to run a "fs setserverpreferences" on the remote clients. We also
have different CellServDB files on the different sites hiding all remote
database servers which never will become sync site. This makes the
clients faster because the ubik protocol chooses the database server at
random.

It also looks for me as if NT-machines wouldn't honour the network
topology at all. At least older versions of the client.

Hartmut 

"Piscopo, Tony" wrote:
> 
> Hi,
> 
> Looking for some feedback on past/current experiences involving
> implementation and maintenance of an AFS Cell dispersed over
> a WAN environment.
> 
> Scope:
> 
> An existing AFS cell is located at our main site. To bring in remote
> locations over a WAN (T1 connection) each remote site will receive
> one/two DB servers and one/two File Servers, allowing for local
> site access while maintaining central management.
> 
> Concerns:
> 
> Performance. Performing a "vos release" over a WAN? The Sync Site,
> will be receiving all the DB updates and then pushing the new DB files,
> is there a performance impact? Can there be induced Cell delays or
> "freeze" due to slowed DB interaction?
> 
> RW/RO volumes. Our existing cell is size X connecting to a EMC frame.
> Mount points (volumes) intermix to create certain paths inside a directory
> tree. To ensure the remote FS system maintains consistency with the
> data paths, much of the RO data will be replicated to the remote sites,
> ie; /usr/eda, /usr/cad etc. Also, the remote site will most likely have
> their
> own RW data with or without replication. Implementing large data
> storage systems at the remote site is not an option, is there an alternate
> solution here?
> 
> DB servers. Looking for feedback on possible WAN interference
> which would obstruct the availability and performance of the AFS cell
> as a whole. One example; remote site X has a WAN issue and will be
> unavailable for X period of time and we need to perform volume replication,
> account changes, something which requires a DB update, at the central
> site. Are there admin concerns or will the cell's Ubik flow be able to
> handle
> these outages? Will the remote site continue to operate (locally) with an
> outdated database and a reduced Ubik flow? Other examples?
> 
> Lowest IP address wins the role as coordinator. To maintain the current
> configuration at the central site regarding sync site roles, is it absolute
> to worry about the IP address scope? Other topics of concern include time
> synchronization for the Ubik flow and general administrator tasks which
> fall "out of the norm" for a LAN AFS Cell.
> 
> Any feedback is welcome.
> 
> thank you in advance.
> 
> -Tony Piscopo
> NEEC - Intel, Hudson MA.
> 

-- 
-----------------------------------------------------------------
Hartmut Reuter                           e-mail [EMAIL PROTECTED]
                                           phone +49-89-3299-1328
RZG (Rechenzentrum Garching)               fax   +49-89-3299-1301 
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------

Reply via email to