Hi,

We are trying to Decouple our Reporting DB from OLTP. Need urgent help on the 
feasibility of proposed solution for PRODUCTION.

Use Case: Currently, our OLTP and Reporting application and DB are same. Some 
CF are used for both OLTP and Reporting while others are solely used for 
Reporting.Every business transaction synchronously updates the main OLTP CF and 
asynchronously updates other Reporting CFs.

Problem Statement:
1. Decouple Reporting and OLTP such that Reporting load can't impact  OLTP 
performance.
2. Scaling of Reporting  and OLTP modules must be independent
3. OLTP client should not update all Reporting CFs. We generate Data Records on 
File sytem/shared disk.Reporting should use these Records to create Reporting 
DB.
4. Small customers may do OLTP and Reporting on same 3-node cluster. Bigger 
customers can be given an option to have dedicated OLTP and Reporting nodes. 
So, standard Hardware box should be usable for 3 deployments (OLTP,Reporting or 
OLTP+Reporting)

Note: Reporting is ad-hoc, may involve full table scans and does not involve 
Analytics. Data size is huge 2TB (OLTP+Reporting) per node.

Hardware : Standard deployment -3 node cluster with each node having 24 cores, 
64GB RAM, 400GB * 6 SSDs in RAID5

Proposed Solution:
1. Split OLTP and Reporting clients into two application components.
2. For small deployments where more than 3 nodes are not required:
    A. Install 2 Cassandra instances on each node one for OLTP and other for 
Reporting
    B. To distribute I/O load in 2:1 --Remove RAID5 (as Cassandra offers 
replication) and assign 4 disks as JBod for OLTP and 2 disks for Reporting
    C. RAM is abundant and often under-utilized , so assign 8GB each for 2 
Cassandra instance
    D. To make sure that Reporting is not able to overload CPU, tune 
concurrent_reads,concurrent_writes 
 OLTP client will only write to OLTP DB and generate DB record. Reporting 
client will poll FS and populate Reporting DB in required format.
3. Larger customers can have Reporting clients and DB on dedicated physical 
nodes with all resources.

Key Questions:
Is it ok to run 2 Cassandra instances on one node in Production system and 
limit CPU Usage,Disk I/O and RAM as suggested above?
Any other solution for above mentioned problem statement?



Thanks
Anuj

Reply via email to