Hi Ian, > > > what is the fundamental difference between KFS and hadoop such that 2 > > seperate projects are required?
Backing up a bit, - Hadoop: is map-reduce engine + HDFS - KFS: is a filesystem Your question is probably more of what is the difference between KFS and HDFS. Toby's reply to your mail gives me a bit more credit :-) > > * KFS supports atomic append, Hadoop does not > KFS, currently, DOES NOT support atomic record append. We have designed the system for atomic record append; we have held off this feature as others have taken priority. There is support for multiple concurrent writes to a file---the chunkserver that has the write lease serializes the writes. > * KFS supports rebalancing, Hadoop does not Currently this is the case. Owen had replied to one of the posts over the weekend/last week about adding block rebalancing in HDFS in an upcoming Hadoop release. > * KFS exports a POSIX file interface, Hadoop does not (GFS does not, either) To be a bit clear, - with HDFS, you open a file for writing once; you write sequentially start->end - with KFS, you can open a file for writing multiple times; you can seek anywhere and write; the KfsClient::Open() supports O_APPEND in POSIX-style appends; not record-append. There is also the issue with HDFS's writes to a file becoming visible on close. With KFS, - when a client creates a file, the file is part of the fs namespace - writes are cached at the client - whenever the cache is full/application calls flush, the data gets pushed out to the chunkservers - when data is written to chunkservers, it is visible > Maybe KFS can > be integrated with Hadoop's MapReduce to make up for the current lack > of such from Kosmix? Toby---This *is* done. I have a filed a JIRA issue and submitted the code. See Hadoop-1963. Sriram
