I don't think that File instantiation is more slow than the ae process, and Tim 
is talking about tens of thousands of files in the directory tree.  

The only filesystem call that should exist in any new File(..) is a 
normalize(..) or resolve(..) on the passed parameter(s), which should just be 
string manipulation and no actual io calls, native or otherwise.  In other 
words, new File(..) should be fast.

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Karthik Sarma
Sent: Tuesday, May 07, 2013 3:26 PM
To: [email protected]
Subject: Re: files vs strings in collection reader

Hmm, without having actually reviewed the code in cTAKES (I'm not on my work 
computer), my understanding of the "correct" way of doing this is to use the 
listFiles method on the directory File to get an array of Files; this should be 
implemented natively by the JVM and could be faster than individual 
initialization.





--
Karthik Sarma
UCLA Medical Scientist Training Program Class of 20??
Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation to the 
House of Delegates of the American Medical Association [email protected]
gchat: [email protected]
linkedin: www.linkedin.com/in/ksarma


On Tue, May 7, 2013 at 12:17 PM, Tim Miller < 
[email protected]> wrote:

> The FilesInDirectoryCollectionRead**er creates an arraylist of 
> java.io.File objects when it is initialized. For large datasets (~50k
> files) this is substantial time overhead and probably memory as well. 
> Seems like it would be more efficient to use Strings instead of Files 
> there and just open the File object when getNext() is called. It is 
> pretty easy to implement, any downside to making this switch?
> Tim
>

Reply via email to