I am evaluating replacing a homegrown file storage system with HBase. Here are the stats on our current environment:
  - Our workload is going to all be single record reads and writes
- We have 50TB of data, with each record being a10kb to 10mb in size, (average of 300kb), in a single column
  - Peak 60k reads per hour
  - Peak 20k writes per hour

Here are the questions:
What sort of hardware should I be looking at here? Will capacity scale linearly as we add more servers, and for how long?

Will I be able to get at least a 250ms access time with a reasonable cluster size?

From what I understand, we're looking at a theoretical 64mb block read from disk for any row. In practice, how significant is this, when taking in caching and other optimizations?

We can do sequential writes, but at an expected total loss of an anticipated ~50% reduction in store size due to compression. I am also concerned that sequential writes at the end of the table will all end up writing to one disk - instead of distributing the load across all servers.

Thanks in advance,
-Jason

Reply via email to