On Mon, Nov 20, 2017 at 15:09:25 -0500, Jean-Louis Martineau wrote:
> perl automatically close it when it come out of scope.
> The close should before the 'if (kill' line.

Okay, I moved the close() line up to that spot as I applied the patch.

I can confirm that with your patch applied and my followup patch
applied, if I kick off amflush while amdump is running in the
background, it (amflush) prints out
  Could not find any Amanda directories to flush.
and exits without attempting to flush anything.... but if I try again
after amdump finished, I'm able to complete the flush successfully.


A few followup points that came up during my testing:

* when amflush completes now, the pid file is left out there in the
  holding directory, thus preventing the directory from getting deleted:

=====
# ls -lR /amanda/TestBackup-holding/*
/amanda/TestBackup-holding/20171120172656:
total 4
-rw------- 1 backup backup 5 Nov 21 18:50 pid

/amanda/TestBackup-holding/20171121145009:
total 4
-rw------- 1 backup backup 5 Nov 21 18:50 pid
=====

  Off hand I don't see code in amflush or Amflush.pm trying to clean up
  the holding disk directories so I don't know if these "zombie" pid
  files are related to the third issue (i.e. holding_cleanup_dir()),
  but...

* ...as a more general question I was wondering if it made sense
  for programs that take a lock on a holding directory to have a way to
  explicitly release that lock when they are done?  (Probably this
  would be by deleting the pid file, thought it could instead be
  truncating it to zero length, or something.)

  The main reason I was wondering about this is that with the current
  behavior of leaving a particular pid in the pid file forever, it
  seems like there's a good chance of false positives when the lock is
  checked by later jobs (i.e. some other, unrelated process using the
  same pid when the lock-check happens).  If there were some way for a
  process to release the lock, false positives could only happen after a
  process crashed.

  (I don't know how often these false positives would happen in
  practice, but I can imagine it would cause a headache if the pid
  locking a directory containing dumps needing to be flushed to vault
  storage ended up getting reused by some long-runing process...)

* I noticed that  holding.c:holding_cleanup_dir() doesn't include parent
  pid in the pid-file check.  I haven't tracked back to figure out which 
  applications call the holding_cleanup* family of functions so I don't
  know if that will cause actual problems, but I was wondering if that 
  function should instead just call call_take_holding() instead, so the
  logic is all found in one place?


                                                Nathan




----------------------------------------------------------------------------
Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Reply via email to