Hello folks,

I am working on a assignment which need to load data from cassandra and put
it it in Hive.
The Hive is true to the source (cassandra). Means, all the data in
cassandra will be captured in Hive (batch/near real time).

Looking for a generic high level solution that will work for multiple
Cassandra Clusters. We need to load data from Different Cassandra Clusters
to Hive store.

Currently the customer has a SOLR in parallel to Cassandra, will export
data(csv format) from SOLR once in a day and move to Hive. But we can't
impose this to all the other customers.

Cassandra clusters and Hive clusters are hosted in different AWS accounts.

Data to be loaded.
1. Historical load
2. Loading incremental changes

What is the best way to proceed with this?

Reply via email to