SFS - real weird problem

Colin Allinson Mon, 15 Feb 2010 08:54:34 -0800

This is an issue with the same code as from my previous question but now I 
have got something really weird :-


Sorry for the long background explanation but the process that I have 
inherited (and hacked around with is as follows) :-

a)      Server 1 (5 similar severs looking in different places) Wakes up 
every 2 minutes and does the following :-

        - looks for logs to upload to the SFS repository. Each log will be 
complete since midnight (not just new data).
        - creates a user lock file for each log it finds
        - uploads the log to a temporary name (in the SFS pool)
        - Erases the fn 1LOG   file (if it exists)
        - Renames the fn LOG file to fn 1LOG (if it exists)
        - Renames the tempoary file (just uploaded) to fn LOG 
        - Removes the User lock file

        This complex procedure is because Server 2 might be examining the 
file.

b)      Server 2 (10 servers, each dealing with different logs) wakes up 
every 50 seconds and does the following :-

        - Looks in the SFS repository for logs that have more lines than 
currently processed
        - If there is a user lock file then it ignore the file until next 
time
        - It scans each line (from the last one processed to the end) and 
will take various actions (such as cataloguing tapes used).
        - It then records a line pointer for how far it got.

Originally, Server 1 did not mess around with renames but just overwrote 
the log in the SFS repository but we were getting a number of lock ups so 
I changed it and this has significantly improved the situation - but we 
still get the occasional lock up. 

The code in Server 2 is very messy, delicate & sensitive so I don't want 
to mess with it if I can avoid it.

What we are seeing now is that, occasionally, the Rename of the temporary 
file to fn LOG  gives RC=28 with DMSRND1311E Object already exists. In the 
error handling routine I do a listfile and the object is not shown. I also 
do a query locks and get a 'No Locks held'. Just in case of a timing 
issue, I then retry the RENAME upto 5 times with a 5 second delay - with 
no improvement.

The really weird thing is that, once this happens, it will keep happening 
for ever & a day until the server is recycled (IPL CMS). Then the 
condition gets cleared. 


I can understand that there may be a initial timing issue but, as no SFS 
locks are shown, I cannot understand what it is that is not being cleared,

I would be very grateful if anyone can make any suggestions what may be 
happening here?
 
 

Colin Allinson
VM Systems Support
Amadeus Data Processing GmbH

SFS - real weird problem

Reply via email to