On Fri, 20 May 2011 10:11:13 -0500 Brian Bockelman <[email protected]> wrote:
> > On May 20, 2011, at 6:10 AM, Dieter Plaetinck wrote: > > > What do you mean clunky? > > IMHO this is a quite elegant, simple, working solution. > > Try giving it to a user; watch them feed it a list of 10,000 files; > watch the machine swap to death and the disks uselessly thrash. > > > Sure this spawns multiple processes, but it beats any > > api-overcomplications, imho. > > > > Simple doesn't imply scalable, unfortunately. > > Brian True, I assumed if anyone wants this, he knows what he's doing (i.e. the files could be small and already in the Linux block cache). Because why would anyone read files in parrallel if that causes disk seeks all over the place? Ideally, you should tune for 1 sequential read per disk at the time. In that respect, I definitely agree that some clever logic in userspace to optimize disk reads (across a bunch of different possible hardware setups) would be beneficial. Dieter
