Re: Using Hadoop for Record storage

Doug Cutting Thu, 12 Apr 2007 10:06:18 -0700

Andy Liu wrote:

I'm exploring the possibility of using the Hadoop records framework tostore

these document records on disk.  Here are my questions:


1. Is this a good application of the Hadoop records framework, keeping in
mind that my goals are speed and scalability?  I'm assuming the answer is
yes, especially considering Nutch uses the same approach

For read-only access, performance should be decent. However Hadoop'sfile structures do not permit incremental updates. Rather they areprimarily designed for batch operations, like MapReduce outputs. If youneed to incrementally update your data, then you might look at somethinglike BDB, a relational DB, or perhaps experiment with HBase. (HBase isdesigned to be a much more scalable, incrementally updateable DB thanBDB or relational DBs, but its implementation is not yet complete.)


Doug

Re: Using Hadoop for Record storage

Reply via email to