Your architecture is a bit unusual in that you seem to be proposing that
users get direct access to the hadoop storage layer.

More common is to have a controller layer that mediates requests to store or
read data.

With that layer of abstraction, you can deal with some of the problems
associated with file update.  See the recent hbase work, for instance.

Even with that layer of abstraction and the recent massive improvements to
hbase, Hadoop still tends to be much better for batch processing rather than
real-time support of ad hoc user data reads and writes.  Depending on the
data you have and the update patterns, you might be much happier with a
clustered key-value store like Voldemort or Cassandra.  Voldemort especially
has very nice capabilities for dumping large amounts of data from hadoop
into a large store.  It also works to support real-time (ish) random reads
and writes.

On Thu, Jul 23, 2009 at 6:44 AM, Giovanni Tusa <giovan...@gmail.com> wrote:

> Could you also suggest me some other useful links, maybe with examples if
> any, on how to implement such a mechanism?
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to