On 12/20/2012 06:22 PM, Makarius wrote:
I just had a long phone call with Franz Huber, the local system admin
person. All the macbroy20..29 and lxlabbroyX machines involved here
use the same OpenSuse 12.2 with that hg 2.4. So just empirically that
looks like the problem -- breakdowns started approx. at the time of
update of several of these machines.

I made a few experiments to reduce the room for wild speculations and myths:

(Bottom line: everybody use lxbroy10 for pushes and we are safe)

SETUP
=====

The attached crude shell script pushes single changesets to a repository in loop. As source, I used a clone of the Isabelle repository. As destination, I used an empty repository in the same format as our central repository. I ran the script concurrently on various combinations of machines and with different versions of Mercurial.

This test turns out to be fairly selective: Either the repository gets corrupted in the first 1,000 pushes, or it stays intact for 20,000 pushes, where I stopped the test.

My assumption is that the corruption seen here is the same that we have had in production use. The error message is the same.

RESULTS
=======

| No | host A     | hg A   | host B     | hg B   | Conc. | NFS | ->  |
|----+------------+--------+------------+--------+-------+-----+-----|
|  1 | lxlabbroy5 | 2.4    | lxbroy7    | 2.4    | Yes   | Yes | BAD |
|  2 | lxlabbroy5 | 2.4    | -          | -      | No    | Yes | OK  |
|  3 | lxlabbroy6 | 2.2.2* | lxbroy7    | 2.1.1* | Yes   | Yes | BAD |
|  4 | lxlabbroy7 | 2.4    | lxlabbroy7 | 2.4    | Yes   | Yes | OK  |
|  5 | lxlabbroy6 | 2.2.2* | lxlabbroy6 | 2.2.2* | Yes   | No  | OK  |
|  6 | lxlabbroy8 | 2.2.2* | lxlabbroy9 | 2.2.2* | Yes   | Yes | BAD |
|  7 | lxlabbroy8 | 2.2.2* | lxlabbroy9 | 2.2.2* | No    | Yes | BAD |
|  8 | lxbroy7    | 2.1.1* | lxbroy8    | 2.1.1* | No    | Yes | OK  |
|  9 | lxbroy6    | 2.1.1* | lxbroy9    | 2.1.1* | Yes   | Yes | OK  |
| 10 | lxlabbroy7 | 2.1.1  | lxlabbroy8 | 2.1.1  | Yes   | Yes | BAD |
| 11 | macbroy20  | 2.2.2* | macbroy21  | 2.2.2* | Yes   | Yes | BAD |
| 12 | lxbroy6    | 2.4    | lxbroy9    | 2.4    | Yes   | Yes | OK  |
| 13 | macbroy20  | 2.1.1  | macbroy21  | 2.1.1  | Yes   | Yes | BAD |

*) Version from the system's installation. Otherwise, Mercurial was
compiled from source.

Conc.:
 Yes: Different processes (on the two hosts) push concurrently
 No: Only one process, but via ssh through two different hosts
 (Here, I used a slightly different script). Exception: Run #2

NFS:
 Does the destination repository live on NFS?

-> :
 OK: Can do 20,000 pushes without seeing a corruption
 BAD: Repository corruption before 1,000 pushes.


INTERPRETATION
==============

- There is no correlation with the Mercurial version in use. Breakages occur with older and newer versions alike, and the same version is OK in other circumstances.

- The error only occurs when the repository is accessed from different hosts. The access does not need to be concurrent (which excludes a problem with Mercurial's locking mechanisms). This is also similar to the situation we had in production use, where concurrent pushes are fairly unlikely.

- At least one of the hosts involved must be lxlabbroy* or macbroy*, the OpenSuSE machines. The Gentoo servers are not affected.

I would say that this points to the SUSE NFS client driver as the source of the problem. If we use lxbroy10 exclusively for pushes, we should be safe until the issue is fixed.

Alex
#!/bin/bash

SRC="./src"
DEST="./dest"


function fail {
  echo "$1"
  exit 1
}


[ -n "$1" ] || fail "Abort: path to mercurial must be given as first argument"
HG="$1"
echo "Using mercurial $HG"


while :
do
  # Get tip revision from DEST
  DEST_TIP=$($HG -R $DEST tip --template '{rev}')
  let "NEXT = $DEST_TIP + 1"

  # Push one changeset
  echo "Pushing revision $NEXT"
  "$HG" -R "$SRC" push -f -r "$NEXT" "$DEST" > /dev/null

  # quick integrity check
  $HG -R $DEST tip > /dev/null || fail "hg tip failed. Broken repository!?!"

  sleep 0.2
done
  
_______________________________________________
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev

Reply via email to