Re: DFS question: does append-only means faster updates ?

Doug Cutting Fri, 14 Jul 2006 02:48:19 -0700

drwho wrote:

I've always wondered if a lack of overwrite / random-write op means that 
updates are much faster than convention filesystems..

Not really, since DFS is implemented on top of the ordinary filesystem,it's never any faster at serial access. What it adds is scalability(petabytes in a single namespace) and reliability (continuous access todata through disk and host failures) and distributed performance (1000hosts reading or writing in parallel to the same logical FS).

The fact that both (dfs, gfs) support delete op, does it mean that
fragmentation will still be a big problem ?

Fragmentation should not be a problem, since files are chunked into128MB blocks stored in local filesystems.

Also, would the lack of overwrite / random-write ops mean that the filesystem 
is less suitable for apps like online word-processor or even online spreadsheet 
/ database ?

Yes, such applications are probably not appropriate for directimplementation on top of DFS. It would work, but it would not be thebest utilization of resources. Google uses BigTable, layered on top ofGFS, to store small items that may be independently updated. Hadoop maysomeday incorporate something like BigTable. Mike Cafarella hasdiscussed this a bit on the hadoop-dev list:


http://www.mail-archive.com/[email protected]/msg01415.html
http://www.mail-archive.com/[email protected]/msg01443.html

Doug



Doug

Re: DFS question: does append-only means faster updates ?

Reply via email to