[jira] [Commented] (IO-271) FileUtils.copyDirectory should be able to handle arbitrary number of files

2011-11-10 Thread Henri Yandell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/IO-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147904#comment-13147904
 ] 

Henri Yandell commented on IO-271:
--

Should this be resolved as WontFix?

 FileUtils.copyDirectory should be able to handle arbitrary number of files
 --

 Key: IO-271
 URL: https://issues.apache.org/jira/browse/IO-271
 Project: Commons IO
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 2.0.1
Reporter: Stephen Kestle
Priority: Minor

 File.listFiles() uses up to a bit over 2 times as much memory as File.list(). 
  The latter should be used in doCopyDirectory where there is no filter 
 specified.
 This memory usage is a problem when copying directories with hundreds of 
 thousands of files.
 I was also thinking of the option of implementing a file filter (that could 
 be composed with the inputted filter) that would batch the file copy 
 operation; copy the first 1 (that match), then the next 1 etc etc.
 Because of the lack of ordering consistency (between runs) of 
 File.listFiles(), there would need to be a final file filter that would 
 accept files that have not successfully been copied.
 I'm primarily concerned about copying into an empty directory (I validate 
 this beforehand), but for general operation where it's a merge, the 
 modification date re-writing should only be done in the final run of copies 
 so that while batching occurs (and indeed the final missed filtering) files 
 do not get copied if they have been modified after the start time. (I presume 
 that I'm reading FileUtils correctly in that it overrides files...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (IO-271) FileUtils.copyDirectory should be able to handle arbitrary number of files

2011-11-10 Thread Sebb (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/IO-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147959#comment-13147959
 ] 

Sebb commented on IO-271:
-

Yes, I think WontFix is appropriate.

 FileUtils.copyDirectory should be able to handle arbitrary number of files
 --

 Key: IO-271
 URL: https://issues.apache.org/jira/browse/IO-271
 Project: Commons IO
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 2.0.1
Reporter: Stephen Kestle
Priority: Minor

 File.listFiles() uses up to a bit over 2 times as much memory as File.list(). 
  The latter should be used in doCopyDirectory where there is no filter 
 specified.
 This memory usage is a problem when copying directories with hundreds of 
 thousands of files.
 I was also thinking of the option of implementing a file filter (that could 
 be composed with the inputted filter) that would batch the file copy 
 operation; copy the first 1 (that match), then the next 1 etc etc.
 Because of the lack of ordering consistency (between runs) of 
 File.listFiles(), there would need to be a final file filter that would 
 accept files that have not successfully been copied.
 I'm primarily concerned about copying into an empty directory (I validate 
 this beforehand), but for general operation where it's a merge, the 
 modification date re-writing should only be done in the final run of copies 
 so that while batching occurs (and indeed the final missed filtering) files 
 do not get copied if they have been modified after the start time. (I presume 
 that I'm reading FileUtils correctly in that it overrides files...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (IO-271) FileUtils.copyDirectory should be able to handle arbitrary number of files

2011-05-09 Thread Stephen Kestle (JIRA)

[ 
https://issues.apache.org/jira/browse/IO-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030697#comment-13030697
 ] 

Stephen Kestle commented on IO-271:
---

Did I mention I'm trying to do an automatic upgrade on a legacy application, 
and this is the first backup of the message archive (initially thought to be 
logs)? (I was using ant's copy except that it's horrifically slow and bloated - 
it can use 100% of a cpu copying no files and run out of memory in the almost 
the same time that copyDirectory finishes!).

So yeah, you can close the ticket... however, on Windows and Linux, the only 
native operation is {{list()}}, so I see no performance loss iterating over 
that array at copy time instead of in the {{listFiles()}} method

 FileUtils.copyDirectory should be able to handle arbitrary number of files
 --

 Key: IO-271
 URL: https://issues.apache.org/jira/browse/IO-271
 Project: Commons IO
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 2.0.1
Reporter: Stephen Kestle
Priority: Minor

 File.listFiles() uses up to a bit over 2 times as much memory as File.list(). 
  The latter should be used in doCopyDirectory where there is no filter 
 specified.
 This memory usage is a problem when copying directories with hundreds of 
 thousands of files.
 I was also thinking of the option of implementing a file filter (that could 
 be composed with the inputted filter) that would batch the file copy 
 operation; copy the first 1 (that match), then the next 1 etc etc.
 Because of the lack of ordering consistency (between runs) of 
 File.listFiles(), there would need to be a final file filter that would 
 accept files that have not successfully been copied.
 I'm primarily concerned about copying into an empty directory (I validate 
 this beforehand), but for general operation where it's a merge, the 
 modification date re-writing should only be done in the final run of copies 
 so that while batching occurs (and indeed the final missed filtering) files 
 do not get copied if they have been modified after the start time. (I presume 
 that I'm reading FileUtils correctly in that it overrides files...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (IO-271) FileUtils.copyDirectory should be able to handle arbitrary number of files

2011-05-09 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/IO-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030828#comment-13030828
 ] 

Sebb commented on IO-271:
-

As already noted, listFiles() uses a private File constructor to create the 
File instances.
This is able to bypass the normalisation which the public ctors have to 
perform, so list() + new File() is less efficient than listFiles().

By the way, most OSes will have a backup tool which is likely to be 
considerably more efficient than Ant or Commons IO.

 FileUtils.copyDirectory should be able to handle arbitrary number of files
 --

 Key: IO-271
 URL: https://issues.apache.org/jira/browse/IO-271
 Project: Commons IO
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 2.0.1
Reporter: Stephen Kestle
Priority: Minor

 File.listFiles() uses up to a bit over 2 times as much memory as File.list(). 
  The latter should be used in doCopyDirectory where there is no filter 
 specified.
 This memory usage is a problem when copying directories with hundreds of 
 thousands of files.
 I was also thinking of the option of implementing a file filter (that could 
 be composed with the inputted filter) that would batch the file copy 
 operation; copy the first 1 (that match), then the next 1 etc etc.
 Because of the lack of ordering consistency (between runs) of 
 File.listFiles(), there would need to be a final file filter that would 
 accept files that have not successfully been copied.
 I'm primarily concerned about copying into an empty directory (I validate 
 this beforehand), but for general operation where it's a merge, the 
 modification date re-writing should only be done in the final run of copies 
 so that while batching occurs (and indeed the final missed filtering) files 
 do not get copied if they have been modified after the start time. (I presume 
 that I'm reading FileUtils correctly in that it overrides files...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (IO-271) FileUtils.copyDirectory should be able to handle arbitrary number of files

2011-05-08 Thread Stephen Kestle (JIRA)

[ 
https://issues.apache.org/jira/browse/IO-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030470#comment-13030470
 ] 

Stephen Kestle commented on IO-271:
---

Yeah, I tend to agree in the general case. Perhaps it's a case of providing an 
override switch to be low memory instead of fast(er). Although I think if this 
were to be done, I'd check memory usage every 100k files and evaluate whether 
reversion to names is necessary.

Of course, the chances of hitting this sort of issue when using a filter is 
even less likely: so why not just use an Object array for {{list()}} and 
{{listFiles(filter)}}? Resolving {{File originFile = filter == null ? new 
File(srcDir, files[i]) : files[i];}} isn't so bad is it?



 FileUtils.copyDirectory should be able to handle arbitrary number of files
 --

 Key: IO-271
 URL: https://issues.apache.org/jira/browse/IO-271
 Project: Commons IO
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 2.0.1
Reporter: Stephen Kestle
Priority: Minor

 File.listFiles() uses up to a bit over 2 times as much memory as File.list(). 
  The latter should be used in doCopyDirectory where there is no filter 
 specified.
 This memory usage is a problem when copying directories with hundreds of 
 thousands of files.
 I was also thinking of the option of implementing a file filter (that could 
 be composed with the inputted filter) that would batch the file copy 
 operation; copy the first 1 (that match), then the next 1 etc etc.
 Because of the lack of ordering consistency (between runs) of 
 File.listFiles(), there would need to be a final file filter that would 
 accept files that have not successfully been copied.
 I'm primarily concerned about copying into an empty directory (I validate 
 this beforehand), but for general operation where it's a merge, the 
 modification date re-writing should only be done in the final run of copies 
 so that while batching occurs (and indeed the final missed filtering) files 
 do not get copied if they have been modified after the start time. (I presume 
 that I'm reading FileUtils correctly in that it overrides files...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (IO-271) FileUtils.copyDirectory should be able to handle arbitrary number of files

2011-05-08 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/IO-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030476#comment-13030476
 ] 

Sebb commented on IO-271:
-

I'm not sure the memory usage checking strategy is appropriate, If you are near 
the limits of memory, creating the original list may well tip you over the 
limit anyway.

Further, for very large directories, even a String[] array may be too much.

As I wrote earlier, the only sure way to fix this is to process the file 
entries one by one, but Java does not seem to provide this.

As already explained, listFiles() is more efficient at creating the File 
entries than list() plus new File(), so I don't think the general case should 
be changed even in the non-filter case.

AFAICT, your use case is very unusual. Given the difficulties that such large 
directories are likely to cause other applications, and the fact that it is not 
possible to support arbitrarily large numbers of files, I would look to see if 
I could reduce the directory size, e.g. by splitting into subdirectories. That 
would probably improve file system performance too.

 FileUtils.copyDirectory should be able to handle arbitrary number of files
 --

 Key: IO-271
 URL: https://issues.apache.org/jira/browse/IO-271
 Project: Commons IO
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 2.0.1
Reporter: Stephen Kestle
Priority: Minor

 File.listFiles() uses up to a bit over 2 times as much memory as File.list(). 
  The latter should be used in doCopyDirectory where there is no filter 
 specified.
 This memory usage is a problem when copying directories with hundreds of 
 thousands of files.
 I was also thinking of the option of implementing a file filter (that could 
 be composed with the inputted filter) that would batch the file copy 
 operation; copy the first 1 (that match), then the next 1 etc etc.
 Because of the lack of ordering consistency (between runs) of 
 File.listFiles(), there would need to be a final file filter that would 
 accept files that have not successfully been copied.
 I'm primarily concerned about copying into an empty directory (I validate 
 this beforehand), but for general operation where it's a merge, the 
 modification date re-writing should only be done in the final run of copies 
 so that while batching occurs (and indeed the final missed filtering) files 
 do not get copied if they have been modified after the start time. (I presume 
 that I'm reading FileUtils correctly in that it overrides files...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (IO-271) FileUtils.copyDirectory should be able to handle arbitrary number of files

2011-05-07 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/IO-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030354#comment-13030354
 ] 

Sebb commented on IO-271:
-

If using String[] list() instead of File[] listFile():
* when using a filter, each String has to be turned into a File.
* the copy stage also requires the String to be turned into a File.

Using String[] does reduce the maximum memory requirements as the File lifetime 
is very short.
However in the filtered case it can double the number of File instances that 
need to be created.

Also, the listFiles() methods are more efficient, because they take advantage 
of the fact that the list() entries have already been normalised.

I'm not sure these trade-offs are worth it for the general case.

 FileUtils.copyDirectory should be able to handle arbitrary number of files
 --

 Key: IO-271
 URL: https://issues.apache.org/jira/browse/IO-271
 Project: Commons IO
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 2.0.1
Reporter: Stephen Kestle
Priority: Minor

 File.listFiles() uses up to a bit over 2 times as much memory as File.list(). 
  The latter should be used in doCopyDirectory where there is no filter 
 specified.
 This memory usage is a problem when copying directories with hundreds of 
 thousands of files.
 I was also thinking of the option of implementing a file filter (that could 
 be composed with the inputted filter) that would batch the file copy 
 operation; copy the first 1 (that match), then the next 1 etc etc.
 Because of the lack of ordering consistency (between runs) of 
 File.listFiles(), there would need to be a final file filter that would 
 accept files that have not successfully been copied.
 I'm primarily concerned about copying into an empty directory (I validate 
 this beforehand), but for general operation where it's a merge, the 
 modification date re-writing should only be done in the final run of copies 
 so that while batching occurs (and indeed the final missed filtering) files 
 do not get copied if they have been modified after the start time. (I presume 
 that I'm reading FileUtils correctly in that it overrides files...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (IO-271) FileUtils.copyDirectory should be able to handle arbitrary number of files

2011-05-05 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/IO-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029669#comment-13029669
 ] 

Sebb commented on IO-271:
-

Using list() instead of listFiles() would be possible, but would only double 
the size of a directory that could be processed.
The only way to truly fix the problem would be to use a method that provided 
access to the file names one by one, but there does not appear to be a method 
to do this.

AFAICT FileUtils does not override anything - anyway, why would it be necessary 
to delay updating the mod. date on the target file?

Personally, I don't think this is worth implementing. Users can always 
implement their own filtering to split the transfer into chunks. Or just make 
sure that directories don't contain so many files - this is likely to cause 
problems elsewhere as well.

 FileUtils.copyDirectory should be able to handle arbitrary number of files
 --

 Key: IO-271
 URL: https://issues.apache.org/jira/browse/IO-271
 Project: Commons IO
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 2.0.1
Reporter: Stephen Kestle

 File.listFiles() uses up to a bit over 2 times as much memory as File.list(). 
  The latter should be used in doCopyDirectory where there is no filter 
 specified.
 This memory usage is a problem when copying directories with hundreds of 
 thousands of files.
 I was also thinking of the option of implementing a file filter (that could 
 be composed with the inputted filter) that would batch the file copy 
 operation; copy the first 1 (that match), then the next 1 etc etc.
 Because of the lack of ordering consistency (between runs) of 
 File.listFiles(), there would need to be a final file filter that would 
 accept files that have not successfully been copied.
 I'm primarily concerned about copying into an empty directory (I validate 
 this beforehand), but for general operation where it's a merge, the 
 modification date re-writing should only be done in the final run of copies 
 so that while batching occurs (and indeed the final missed filtering) files 
 do not get copied if they have been modified after the start time. (I presume 
 that I'm reading FileUtils correctly in that it overrides files...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira