Re: [fossil-users] can fossil try harder on sync failure?

Andy Bradford Sat, 19 Apr 2014 22:19:25 -0700

Thus said Matt Welland on Wed, 16 Apr 2014 09:01:28 -0700:

> fossil commit cfgdat tests -m "Added another drc test"
> Autosync:  ssh://host/path/project.fossil
> Round-trips: 1   Artifacts sent: 0  received: 0
> Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT
> m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM
> time_fudge);}
> Round-trips: 1   Artifacts sent: 0  received: 0
> Pull finished with 360 bytes sent, 280 bytes received
> Autosync failed
> continue in spite of sync failure (y/N)? n


I've done a  fair bit of profiling  with this, and this  seems to happen
primarily with  the test-http command  (the default sync method  for SSH
clients). I don't know what the history is behind the test-http command,
but my  guess is that it  was really not  intended to be a  heavily used
sync  method for  shared  repositories.  I'm not  really  sure why  this
particular database locking error  happens so frequently with test-http,
but not at all with  http. This is happening in manifest_crosslink_end()
when it's trying to fudge times.

If I force my  SSH command to use http instead  of test-http, this error
disappears entirely and I only ever  see an occasional locking error due
to multiple  committers when I try  to commit large change  sets (like a
10,000  line, 840K  change set);  same behavior  as standard  HTTP/HTTPS
transports in my environment (slow disk/cpu/network).

Are all  your users using SSH  to access shared repositories?  Or do you
just have a few users using SSH?

Perhaps  it would  be better  to  switch to  using SSH  keys and  forced
commands to  cause fossil to  use http  instead of test-http?  This does
require a  bit more  setup. For  example, each .fossil  has to  have the
remote_user_ok configuration  enabled so  you can setup  the REMOTE_USER
environment variable  for them.  This is because  there currently  is no
mechanism to use Fossil authentication  while using SSH as the transport
and  fossil  http requires  it  if  you want  to  commit.

I suppose an alternative configuration would be to give nobody/anonymous
users the  ability to  write, which  if SSH  authentication is  the only
allowed sync method  it may be acceptable. The only  drawback that I see
there is that the rcvfrom information  would show up as having come from
nobody, e.g.,

User:   amb
Received From:  nobody @ 192.168.1.9 on 2014-04-20 04:33:35

I think one thing I've learned from  all this is that forks and database
locking errors  occur much  more frequently on  slow hardware  and large
change  sets.  Also, I  seem  to  be able  to  cause  forking that  goes
undetected (without a warning). All of  this probably explains why it is
difficult to reproduce except on older hardware.

As for making sync try harder, we  could certainly just loop X number of
times if we  think it is worth it  (not sure how feasible it  will be to
make it silent, or if there will  be other side effects). Here I have it
loop for  10 times before  bailing. As you can  see it failed  once, but
then succeeded the second time and  received updates that indicate it is
out of sync:

$ fossil ci -m synctest2
Autosync:  ssh://fossil/tmp/test.fossil
Round-trips: 1   Artifacts sent: 0  received: 0
Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 
FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);}
Round-trips: 1   Artifacts sent: 0  received: 0
Pull finished with 314 bytes sent, 280 bytes received
Autosync failed
Autosync:  ssh://fossil/tmp/test.fossil
Round-trips: 3   Artifacts sent: 0  received: 102
Pull finished with 3451 bytes sent, 170661 bytes received
would fork.  "update" first or use --allow-fork.

There  was  also  a  sync  failure  on  the  first  committer  after  it
successfully committed the artifacts:

$ fossil ci -m synctest1
Autosync:  ssh://fossil/tmp/test.fossil
Round-trips: 1   Artifacts sent: 0  received: 0
Pull finished with 316 bytes sent, 229 bytes received
New_Version: 04e7debfa4f29ee3c1635007e3f380f0a0630366
Autosync:  ssh://fossil/tmp/test.fossil
Round-trips: 3   Artifacts sent: 101  received: 0
Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 
FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);}
Round-trips: 3   Artifacts sent: 101  received: 0
Sync finished with 179617 bytes sent, 3234 bytes received
Autosync failed
Autosync:  ssh://fossil/tmp/test.fossil
Round-trips: 1   Artifacts sent: 0  received: 1
Sync finished with 4916 bytes sent, 2724 bytes received

Thoughts?

Andy
-- 
TAI64 timestamp: 40000000535358db


_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] can fossil try harder on sync failure?

Reply via email to