Hi All,

BAM capacity planning it bit involved than other products. I was doing one
such case with Mifan and following are steps we followed. Please comment.

1. Need to know roughly how much data you need to handle. One way to get
this is knowing the event rate. Lets say 100 TPS. That means if each record
is 2K, then it is about 18GB per day, about 550GB per month, and about
6.5TB per year.

2. First Questions is how long we keep data. Generally, this should be like
1 month. You need enough disk to hold the data. If you have 3 replicas you
need 3X that, and Cassandra also add an overhead (
http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html
)

3. Then think how much data you will be processing with Hive. If you
process once a day, each processing will pull like 18GB. Need to make sure
there is enough computing power. Hadoop is mostly decided by number of
cores, and Cassandra we follow data stacks recommendations. I think core
can do about 20MB second (only a guess).  See
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
.

--Srinath






-- 
============================
Srinath Perera, Ph.D.
   http://people.apache.org/~hemapani/
   http://srinathsview.blogspot.com/
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to