We are very interested in ideas and patches to improve the systems
stability.
This is very young software, but we are using it at very large scale
and intend to keep enhancing it. We currently have a 2000 node file
system with 3TB raw storage per node and are supporting millions of
files.
With careful shepherding we've had no data loss events this year (we
only recently reach 2000 nodes). The biggest risks right now are
user errors, due to lack of basic access protections. Some fixes for
that are in the works.
On Sep 5, 2007, at 5:56 PM, Jeff Hammerbacher wrote:
We have very similar plans for Hadoop to what C G quotes below, but
we've
found the stability of HDFS to be quite troublesome. We've
corrupted HDFS
three different ways in a few weeks: 1) running jStack on the
Namenode; 2)
loading lots of small files into HDFS, causing it to hang on a Map/
Reduce
job and subsequently display corruption on restart; 3) upgrading to
a newer
version of Hadoop. Thus we are very uncertain about treating HDFS
as a
reliable long-term data store.
That being said, we're excited about the opportunities created by
Hadoop so
we're going to put some time into making it more reliable and
creating a
utility to archive data out of HDFS for backup purposes.
On 9/5/07, C G <[EMAIL PROTECTED]> wrote:
>
> Our intention is to use HDFS as the core of a large "data
repository". We
> store "raw" data within HDFS on a more-or-less permanent basis, and
> map/reduce it to produce load files for our data warehouse. We
have other
> plans as well all centered around storing data on a very long
term basis in
> HDFS. So you're in good company...
>
> Our plan is for a 64T HDFS repository, with a replication
factor of 3
> for a ~21T data space.
>
> C G
>
>
> Dongsheng Wang <[EMAIL PROTECTED]> wrote:
>
> We are looking at using HDFS as a long term storage solution. We
want to
> use it to stored lots of files. The file could be big and small,
they are
> images, videos etc... We only write the files once, and may read
them many
> times. Sounds like it is perfect to use HDFS.
>
> The concern is that since it's been engineered to support
MapReduce there
> may be fundamental assumptions that the data being stored by HDFS is
> transient in nature. Obviously for our scalable storage solution
zero data
> loss or corruption is a heavy requirement.
>
> Is anybody using HDFS as a long term storage solution? Interested
in any
> info. Thanks
>
> - ds
>
>
> ---------------------------------
> Yahoo! oneSearch: Finally, mobile search that gives answers, not web
> links.
>
>
> ---------------------------------
> Ready for the edge of your seat? Check out tonight's top picks on
Yahoo!
> TV.