Our intention is to use HDFS as the core of a large "data repository". We
store "raw" data within HDFS on a more-or-less permanent basis, and map/reduce
it to produce load files for our data warehouse. We have other plans as well
all centered around storing data on a very long term basis in HDFS. So you're
in good company...
Our plan is for a 64T HDFS repository, with a replication factor of 3 for a
~21T data space.
C G
Dongsheng Wang <[EMAIL PROTECTED]> wrote:
We are looking at using HDFS as a long term storage solution. We want to use it
to stored lots of files. The file could be big and small, they are images,
videos etc... We only write the files once, and may read them many times.
Sounds like it is perfect to use HDFS.
The concern is that since itÂ’s been engineered to support MapReduce there may
be fundamental assumptions that the data being stored by HDFS is transient in
nature. Obviously for our scalable storage solution zero data loss or
corruption is a heavy requirement.
Is anybody using HDFS as a long term storage solution? Interested in any info.
Thanks
- ds
---------------------------------
Yahoo! oneSearch: Finally, mobile search that gives answers, not web links.
---------------------------------
Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV.