Re: What about append in hadoop files ?

Konstantin Shvachko Fri, 14 Jul 2006 11:10:48 -0700

Eric,

I remember Doug advised somebody on a related issue to use a directoryinstead of a file for long lasting appends.You can logically divide your output into smaller files and close themwhenever the logical boundary is reached.The directory can be treated as a collection of records. May be thiswill work for you.

IMO the concurrent append feature is a high priority task.


--Konstantin

Doug Cutting wrote:

drwho wrote:
If so, GFS, is also suitable only for large, offline, batchcomputations ?
I wonder how Google is going to use GFS for writely or their online
spreadsheet or their  BigTable (their gigantic relational DB).
Did I say anything about GFS? I don't think so. Also, I said,"currently" and "primarily", not "forever" and "exclusively". I wouldlove for DFS to be more suitable for online, incremental stuff, butwe're a ways from that right now. As I said, we're pursuingreliability, scalability and performance before features like append.If you'd like to try to implement append w/o disrupting work onreliability scalability and performance, we'd welcome yourcontributions. The project direction is determined by contributors.
Note that BigTable is a complex layer on top of GFS that caches andbatches i/o. So, while GFS does implement some features that DFSstill does not (like appends), GFS is probably not used directly by,e.g., writely. Finally, BigTable is not relational.
Doug
Doug Cutting <[EMAIL PROTECTED]> wrote: <chopped>
DFS is currently primarily used to support large, offline, batchcomputations. For example, a log of critical data with tighttransactional requirements is probably an inappropriate use of DFS atthis time. Again, this may change, but that's where we are now.
Doug




Thanks much.
-eric

Re: What about append in hadoop files ?

Reply via email to