POSTSCHEDULECMD command failing after a long running backups with "File not found" message

Robert Talda Fri, 09 Dec 2016 08:40:49 -0800

Folks:
  Wondering if anyone has seen this one.  

Environment: 
TSM client for Linux x86_64 v7.1.6.0 running on a Red Hat Enterprise Linux 
Server release 5.11 (Tikanga) platform accessing a TSM server for Linux v 
7.1.5.200.


  We have a NetApp appliance which we backup from a dedicated proxy system.  
The NetApp appliance is offered to campus as a service, so shares are created 
and deleted randomly as new customers subscribe to the service and old 
customers drop off.  The resulting dynamic nature of the shares on the NetApp 
appliance requires a daily redefinition of the backup configuration.   So, 
every evening at 6 pm, a script fires that creates:
- the dsm.opt file defining the domains to backup; and
- a preschedule command script to mount the necessary shares on the proxy 
system before the backup; and
- a postschedule command script to unmount the shares
Then, around 9 pm, a scheduled backup is initiated.

  Most days, this functions flawlessly.  However, on occasion, the post 
schedule command script fails, creating the following message in the 
dsmsched.log:
12/09/2016 07:19:26 ANS1821E Unable to start POSTSCHEDULECMD/ PRESCHEDULECMD 
'/opt/tivoli/tsm/client/ba/bin/vfilers_postsched’

 Now, the file /opt/tivoli/tsm/client/ba/bin/vfilers_postsched exists - and can 
be ran manually without issue.  This has occurred at seeming random intervals 
over the past year, but I’ve realized that this seems to happen when the backup 
runs long.   For example, the backup that generated the error message started 
on 12/07/16 at 21:11;44, so it ran for roughly 34 hours.  This happens from 
time to time, for various reasons.

  What I think is happening:
- 12/07 18:00, the daily backup configuration script ran, creating, among other 
files, the vfilers_postsched command script containing the unmount commands
- 12/07 12:11 the scheduled backup commenced
- 12/08 18:00 the daily backup configuration script ran, creating a new version 
of the vfilers_postsched command script
- 12/09 07:19 the scheduled backup finished - and the TSM client tried to run 
the vfilers_postsched command - but it failed, generating the message above 
instead

  I think the TSM client is looking for the vfilers_postsched file that was 
created on 12/07  but no longer exists, having been overwritten on 12/08.   
This seems a little far fetched, but knowing that the TSM client reads the 
configuration file once on startup, and never again, leads me to suspect that 
the client opens the postschedulecmd file at that time as well - and not when 
the backup is over.  And thus, can’t read it when the backup is over.

  We can work around this a variety of ways, but it would be nice to have a 
root cause..

 Hypothesizing,
Bob T


Robert Talda
EZ-Backup Systems Engineer
Cornell University
+1 607-255-8280
r...@cornell.edu

POSTSCHEDULECMD command failing after a long running backups with "File not found" message

Reply via email to