RE: C API for Hadoop DFS

Yoram Arnon Thu, 27 Apr 2006 09:47:25 -0700

 
> 
> Yoram Arnon wrote:
> > Perhaps the more common approach of separate calls is better. 
> > getFileStat would return more information per file, possibly 
> > everything fstat returns, future proofing it for a time 
> HDFS supports 
> > things like modification time, and will not return a file name, 
> > allowing the memory for dfsFileInfo to be allocated by the client, 
> > while dfsReadDir would return a list of just names and types.
> 
> I agree that this is a more elegant API, but it could place 
> considerably more stress on the namenode for computations 
> like 'du', which walks the directory tree, summing file 
> sizes.  I don't think we want to bloat each stat with full 
> block lists, but including stat information in directory 
> listings saves a lot of RPCs.


Good point. So the information returned when listing a directory should
include sizes, and perhaps a bit more info too - whatever is necessary to be
efficient.

> 
> So perhaps instead we should have separate calls to list a 
> directory and to stat an individual name.  The directory 
> listing could still consist of stat structs, but when you 
> stat a directory you get only a single stat struct.  Does that work?

Yes.
Ideally, a single stat returnes a lot of information, but not a name (you
already know the name), making memory allocation trivial, while a list
returns names, but less info. Lists may be long, so they should contain what
we need, but no more. Stat is just one, so size matters less, but it may be
called often, so simplicity matters.

> 
> Doug
> 
>

RE: C API for Hadoop DFS

Reply via email to