On Mon, Nov 20, 2017 at 15:09:25 -0500, Jean-Louis Martineau wrote:
> perl automatically close it when it come out of scope.
> The close should before the 'if (kill' line.
Okay, I moved the close() line up to that spot as I applied the patch.
I can confirm that with your patch applied and my followup patch
applied, if I kick off amflush while amdump is running in the
background, it (amflush) prints out
Could not find any Amanda directories to flush.
and exits without attempting to flush anything.... but if I try again
after amdump finished, I'm able to complete the flush successfully.
A few followup points that came up during my testing:
* when amflush completes now, the pid file is left out there in the
holding directory, thus preventing the directory from getting deleted:
=====
# ls -lR /amanda/TestBackup-holding/*
/amanda/TestBackup-holding/20171120172656:
total 4
-rw------- 1 backup backup 5 Nov 21 18:50 pid
/amanda/TestBackup-holding/20171121145009:
total 4
-rw------- 1 backup backup 5 Nov 21 18:50 pid
=====
Off hand I don't see code in amflush or Amflush.pm trying to clean up
the holding disk directories so I don't know if these "zombie" pid
files are related to the third issue (i.e. holding_cleanup_dir()),
but...
* ...as a more general question I was wondering if it made sense
for programs that take a lock on a holding directory to have a way to
explicitly release that lock when they are done? (Probably this
would be by deleting the pid file, thought it could instead be
truncating it to zero length, or something.)
The main reason I was wondering about this is that with the current
behavior of leaving a particular pid in the pid file forever, it
seems like there's a good chance of false positives when the lock is
checked by later jobs (i.e. some other, unrelated process using the
same pid when the lock-check happens). If there were some way for a
process to release the lock, false positives could only happen after a
process crashed.
(I don't know how often these false positives would happen in
practice, but I can imagine it would cause a headache if the pid
locking a directory containing dumps needing to be flushed to vault
storage ended up getting reused by some long-runing process...)
* I noticed that holding.c:holding_cleanup_dir() doesn't include parent
pid in the pid-file check. I haven't tracked back to figure out which
applications call the holding_cleanup* family of functions so I don't
know if that will cause actual problems, but I was wondering if that
function should instead just call call_take_holding() instead, so the
logic is all found in one place?
Nathan
----------------------------------------------------------------------------
Nathan Stratton Treadway - [email protected] - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239