Yoram Arnon wrote:
Perhaps the more common approach of separate calls is better. getFileStat would return more information per file, possibly everything fstat returns, future proofing it for a time HDFS supports things like modification time, and will not return a file name, allowing the memory for dfsFileInfo to be allocated by the client, while dfsReadDir would return a list of just names and types.
I agree that this is a more elegant API, but it could place considerably more stress on the namenode for computations like 'du', which walks the directory tree, summing file sizes. I don't think we want to bloat each stat with full block lists, but including stat information in directory listings saves a lot of RPCs.
So perhaps instead we should have separate calls to list a directory and to stat an individual name. The directory listing could still consist of stat structs, but when you stat a directory you get only a single stat struct. Does that work?
Doug
