Hi Doug, thanks for the positive feedback! I agree with you and Eric on the getBlockSize/getHosts/dfsCopy suggestions. Will revise the spec accordingly. Thanks, Devaraj.
-----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 26, 2006 10:26 PM To: [email protected] Subject: Re: C API for Hadoop DFS Devaraj Das wrote: > Attached is a draft of the C API specification that some of us (in Yahoo) > have been thinking about. The specification is closely tied to the API > exported by Hadoop's FileSystem class. > Will really appreciate any comments, etc. on the specification. Overall, this looks great! Thanks for working on this! > /** > * dfsFileLocationInfo > * used to get the mapping between file blocks and the hostnames where > * they are stored. Due to replication, a file block could be stored on > * multiple hosts. > */ > typedef struct { > char **hostname; > int numHosts; > } dfsFileLocationInfo; > > /** > * dfsStat > * used for getting information about a file/directory > */ > typedef struct { > tObjectKind mKind; /** file or directory */ > char *mName; /* the name of the file */ > tTime mCreationTime; > dfsFileLocationInfo *fileLocationInfo; /*the last element > in the array is NULL*/ > long mSize; /*the size of the file in bytes */ > bool replicated; /*whether this file is replicated */ > } dfsFileInfo; > > /** return information about a path as a (dynamically allocated) array > * of dfsFileInfo. > * numEntries is set to the number of elements in the array. > * If the path happens to be a file, the array will have just one element. > * If the path happens to be a directory, the dfsFileInfo elements in the > * array will contain information about the files/sub-dirs within the path. > * NULL is returned if the path does not exist or some other error is > * encountered. freeDfsFileInfo should be called passing the array and > * numEntries when it is no longer needed. > */ > dfsFileInfo *dfsGetPathInfo(dfsFS fs, char *path, int *numEntries); I'm a little confused about the dfsFileLocationInfo. It exposes too much of the filesystem internals, that applications don't require. It's also expensive to return full block lists with directory listings. Instead, I think we need the following two functions: tOffset getBlockSize(dfsFs fs); char** geHosts(dfsFs fs, char* file, tOffset pos); This would return an array of hosts that contain the specified position in a file. Does that make sense? > int dfsCopyFromLocalFile(dfsFs fs, char *src, char *dst); > int dfsCopyToLocalFile(dfsFs fs, char *src, char *dst); > int dfsMoveFromLocalFile(dfsFs fs, char *src, char *dst); These are utility methods, that could be implemented by user code, i.e., not core methods. That's fine. But perhaps we should add another: int dfsCopy(dfsFs fs, char* src, char* dst); Otherwise lots of applications will end up writing this themselves. Thanks again, Doug
