Hello folks, I am working on a assignment which need to load data from cassandra and put it it in Hive. The Hive is true to the source (cassandra). Means, all the data in cassandra will be captured in Hive (batch/near real time).
Looking for a generic high level solution that will work for multiple Cassandra Clusters. We need to load data from Different Cassandra Clusters to Hive store. Currently the customer has a SOLR in parallel to Cassandra, will export data(csv format) from SOLR once in a day and move to Hive. But we can't impose this to all the other customers. Cassandra clusters and Hive clusters are hosted in different AWS accounts. Data to be loaded. 1. Historical load 2. Loading incremental changes What is the best way to proceed with this?