Deployment problems caused by file deletion failures
----------------------------------------------------

                 Key: GERONIMO-3489
                 URL: https://issues.apache.org/jira/browse/GERONIMO-3489
             Project: Geronimo
          Issue Type: Bug
      Security Level: public (Regular issues)
          Components: deployment
    Affects Versions: 2.0.1
            Reporter: Ted Kirby
             Fix For: 2.0.2, 2.0.x, 2.1


File.delete() failures in IOUtil.recursiveDelete() are causing various 
deployment problems.  I open this JIRA to discuss them to see how the server 
might better handle them.  In all but one case, delete failures are not even 
noted with a log record!  Deletion problems are seen in many environments and 
platforms, but they are persistently fatal when using a NFS file system for the 
repository.

In investigating the problem, I have added code to recursiveDelete to retry the 
delete a few times if it fails.  I added code to list directory contents if a 
directory delete failed, and saw a file named .nfs000000002bc43500000053e in 
the directory.  My first attempt at a bypass was to retry a failed delete 5 
times, sleeping a second before each try.  This did not work.  I added a call 
to System.gc() before each sleep, and this got me passed the problem.  
Interestingly, two retries were required to get this to work.  In another 
version, each retry was a second longer, and I printed all file names in a 
directory before trying the delete.  This worked in most cases, but required 
the full 5 retries, so I suspect System.gc() would have time.  
System.runFinalization() would be something else to try.

RepositoryConfigurationStore.createNewConfigurationDir(Artifact) shows the 
failing end of the deletion problem, with the dreaded 
ConfigurationAlreadyExistsException("Configuration already exists: " + 
configId)exception.  I think this message is not good.  It should really say 
directory already exists.  If the file is not deleted on undeploy, this failure 
occurs on a subsequent deploy.  What is really bad is if the user invokes a 
redeploy operation, and the file delete fails on the undeploy.  It is important 
that undeploy not complete until the file goes away.

>From other environments, I am not convinced that all file handles and 
>references, and particularly open streams, are being closed on some artifacts. 
> This will cause the delete to fail.  It may be that the gc() calls are 
>cleaning these up, and allowing the deletes to work in my case above.

Another option is that 
RepositoryConfigurationStore.createNewConfigurationDir(Artifact) not throw a 
ConfigurationAlreadyExistsException if the only problem is an empty directory 
structure exists.  The next line creates the directory structure anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to