Subject: openafs-dbserver: VLDB changes not being sync'ed to vldb.DB0
Package: openafs-dbserver
Version: 1.4.7~pre3.dfsg1-1
Severity: critical
Justification: breaks the whole system

*** Please type your report below this line ***

Recent vlserver's fail to write VLDB changes to the
/var/lib/openafs/db/vldb.DB0 file on non sync-sites. The effect is that,
whilst the in-memory VLDB is correct, the version on disk is not correct
except on the sync site. If all vlserver's for a cell are restarted *at
the same time*, all recent changes to the VLDB are lost.

The problem is reproducible:
- Stop, with bos, all 3 vlserver's (all three are running the version
  below).
- Remove /var/lib/openafs/db/vldb* on all db servers.
- Restart, with bos, all 3 vlserver's. Empty vldb.DB0 files are
  created on all servers. The vlservers show no errors in logs.
- Wait for quorum to be established (check via udebug, recovery
  state 1f).
- Run 'vos listvldb' to check that no volumes are registered.
- Run 'vos syncvldb' for each fileserver in cell. 
- udebug on sync site shows DB version incrementing + recovery state 1f.
- 'vos listvldb' now shows all volumes in cell correctly and all
  clients can successfully access cell volumes.
- Wait 1 or more hours.
- The vldb.DB0 file has zero size on non sync-site and timestamp when
  vlserver was started. On sync site it has grown and has timestamp of
  last syncvldb operation.
- Restart all vlservers. The vlservers show no errors in logs.
- Wait for quorum to be established (check via udebug) + recovery
  state 1f.
- 'vos listvldb' shows no volumes. 
- Redoing the syncvldb allows the clients to again access volumes.

This problem was also seen with i686 dbserver on testing (before
upgrade to amd64 testing) and seems to have begun somewhere after
openafs 1.4.2. Initially the problem was seen with a VLDB that had
worked correctly for 2+ years. At some point (1.4.6?) recently changes
stopped being written to the vldb.DB0 (but no errors were logged) and
the above procedure was attempted in order begin with a clean slate.
The effect however remains and thus cannot be linked to a corrupt
vldb.DB0. Testing with a backup of the original VLDB also shows this
problem. vldb_check seems satisfied that the vldb.DB0 in all cases 
not corrupted. 

>From the above it appears that:
- the vldb.DB0 file is not being updated on non-sync sites
- when a restart occurs, only the sync site has a recent vldb.DB0
- but is outvoted by the previously non-sync sites and
- recent changes are discarded

-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (990, 'testing'), (300, 'unstable'), (80, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.25-1-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_ZA.UTF-8, LC_CTYPE=en_ZA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages openafs-dbserver depends on:
ii  libc6                 2.7-10             GNU C Library: Shared
libraries
ii  openafs-client        1.4.7~pre3.dfsg1-1 AFS distributed filesystem
client 
ii  openafs-fileserver    1.4.7~pre3.dfsg1-1 AFS distributed filesystem
file se
ii  perl                  5.8.8-12           Larry Wall's Practical
Extraction 

openafs-dbserver recommends no packages.

-- no debconf information





-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to