Hello my company is considering using a DFS for a project we're currently working on. Since we don't have much experience in the field I've compiled a list of questions that I hope can guide us to making better decisions. I would greatly appreciate anyones help regarding these issues.
- How do we handle failure of the single metaserver/namenode? Is there a way to build a no "downtime" solution? - What are the major differences between KFS and HDFS? - spec wise they seem similar. - Our service needs to handle a large amount of small (typically 1-20MB) in size files. Is HDFS/KFS appropriate for this? - Our service requires accessing these files in a low latency fashion, we're less worried about throughput since (a) the files are small so they might be cached by the OS and in any case will reside on a single data server (probably won't have multiple chunks/blocks per file) and (b) we don't have high throughput requirements. We do however require low latency (lets say less than 50ms) to start getting data from the file. Will HDFS/KFS provide those numbers? - Are there any options for providing data reliability without the need for complete replication (waisting storage space). For example performing "raid xor" type operations between chunks/blocks? - Are there any other DFS's you'd recommend looking into, which might better fit our requirements? Thanks, Yoav.
