Hi All, BAM capacity planning it bit involved than other products. I was doing one such case with Mifan and following are steps we followed. Please comment.
1. Need to know roughly how much data you need to handle. One way to get this is knowing the event rate. Lets say 100 TPS. That means if each record is 2K, then it is about 18GB per day, about 550GB per month, and about 6.5TB per year. 2. First Questions is how long we keep data. Generally, this should be like 1 month. You need enough disk to hold the data. If you have 3 replicas you need 3X that, and Cassandra also add an overhead ( http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html ) 3. Then think how much data you will be processing with Hive. If you process once a day, each processing will pull like 18GB. Need to make sure there is enough computing power. Hadoop is mostly decided by number of cores, and Cassandra we follow data stacks recommendations. I think core can do about 20MB second (only a guess). See http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/ . --Srinath -- ============================ Srinath Perera, Ph.D. http://people.apache.org/~hemapani/ http://srinathsview.blogspot.com/
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
