[
https://issues.apache.org/jira/browse/IO-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814081#comment-17814081
]
Elliotte Rusty Harold edited comment on IO-845 at 2/4/24 1:26 PM:
------------------------------------------------------------------
It would be helpful if @eamonnmcmanus could chime in on this. I'm no longer
inside the plex, and can't test what they're actually doing. Ultimately this
depends on whether the linked resource files copybara is copying during tests
are inside the directory being copied or outside of it. I don't know. If
they're inside it, then this will fix their issue. If they're outside of it
they won't.
That said, Google has been relying on undocumented, unintended, and frankly
surprising behavior of the old version of this method. Until IO-807 I don't
think anyone had much thought about or tested how the copyDirectory methods
handled symbolic links. Now that I have thought about it, I can't see the
previous behavior as sensible. Copying 10 links to the same 1 MB file should
not make 10 MB worth of new files, which is what we were doing. And it's also
true that the old behavior had some obvious cycle bugs that are now fixed by
copying links as links.
We could add a flag to this method somehow that does revert to the old
behavior. However it strikes me that actualizing links (i.e. replacing a link
with the bytes it links to) should be a separate operation from copying a
directory. It might be worth adding a FileUtils.actualize() method that does
this. It makes me a little nervous because it's not an atomic operation and
could fail halfway through, leaving things in an inconsistent state, but that's
true of most I/O methods.
was (Author: elharo):
It would be helpful if @eamonnmcmanus could chime in on this. I'm no longer
inside the plex, and can't test what they're actually doing. Ultimately this
depends on whether the linked resource files copybara is copying during tests
are inside the directory being copied or outside of it. I don't know. If
they're inside it, then this will fix their issue. If they're outside of it
they won't.
That said, Google has been relying on undocumented, unintended and frankly
surprising behavior of the old version of this method. Until IO-807 I don't
think anyone had much thought about or tested how the copyDirectory methods
handled symbolic links. Now that I have thought about it, I can't see the
previous behavior as sensible. Copying 10 links to the same file 1 MB file
should not make 10 MB worth of new files, which is what we were doing. And it's
also true that the old behavior had some obvious cycle bugs that are now fixed
by copying links as links.
We could add a flag to this method somehow that does revert to the old
behavior. However it strikes me that actualizing links (i.e. replacing a link
with the bytes it links to) should be a separate operation from copying a
directory. It might be worth adding a FileUtils.actualize() method that does
this. It makes me a little nervous because it's not an atomic operation and
could fail halfway through, leaving things in an inconsistent state, but that's
true of most I/O methods.
> Copying symbolic links. how should FileUtils.copyDirectory. behave?
> -------------------------------------------------------------------
>
> Key: IO-845
> URL: https://issues.apache.org/jira/browse/IO-845
> Project: Commons IO
> Issue Type: Bug
> Reporter: Elliotte Rusty Harold
> Priority: Major
>
> The current 2.15.1 release and earlier versions have bugs in the copying of
> symbolic link cycles using FileUtils.copyDirectory. There are also bugs in
> copying broken symbolic links reported in IO-807. These are fixed at head
> because symbolic links to directories are now copied as symbolic links rather
> than by resolving the files. However this causes problems for some
> non-hypothetical existing code that relied on the old behavior for copying
> directories containing non-cyclical symbolic links.In particular, Google has
> [reported failures in some of their internal
> projects|https://github.com/apache/commons-io/commit/ec4144b01b4107d6b39f5f4d784cf05217ea4dfa#commitcomment-137567006]
> when using the version at head as a result of this. However I don't think
> that the directories Google is copying contain cycles, so they weren't
> hitting the old bugs.
> The question is how to resolve this so that FileUtils.copyDirectory behaves
> reasonably with all possible directories whether cycles are present or not.
> Is there a consistent and reasonable way we can or should copy some links to
> directories as directories and others as actual directories? There are a
> number of cases to consider:
> 1. No symbolic links. Just copy everything and we're good.
> 2. Symbolic links but only to files, not directories. Copy the file or copy
> the link?
> 3. Symbolic links but only to files within the root directory being copied.
> Right now at head we create a link to the old directory in the source. We
> could instead create a link to the new directory in the target.
> 4. Symbolic links to files outside the root directory being copied. Right now
> at head we create a link to the old directory in the source. We should
> probably keep this behavior.
> I think the behavior we want is as follows, and then I think we handle most
> use cases while avoiding cycle problems:
> 1. Start from a root directory, the source.
> 2. Never copy anything that is not contained in that directory to the target.
> 3. When walking the directory tree, if a symbolic link is encountered:
> 3.1 If the symbolic link points outside the source, create a new symbolic
> link to the same target.
> 3.2 If the symbolic link points to a target inside the source, create a new
> symbolic link to the copy of the target.
> However every symbolic link in the source turns into a symbolic link in the
> copy. Cycles do not cause problems because they are not followed. The only
> discrepancy I see in this approach is that a link to a directory outside the
> source that is a link that points back into the source still points back into
> the source, not the target. Thus the copy can point back into the source
> indirectly. Perhaps when copying links that point outside the source we can
> fully resolve them to and repoint as needed.
> I don't know that this actually fixes Google's problem. They sort of want
> files if not directories to be resolved when copying. Maybe that's a
> argument. Maybe that's a separate method like
> FileUtils.actualizeFiles(directory) that walks a file tree and replaces every
> link to a file with the contents of the target file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)