Hi All,

I am trying to capture the user activities for real estate portal.

I am using RabbitMS and Spark streaming combination where all the Events I
am pushing to RabbitMQ and then 1 secs micro job I am consuming using Spark
streaming.

Later on I am thinking to store the consumed data for analytics or near
real time recommendations.

Where should I store this data in Spark RDD itself and using SparkSQL
people can query this data for analytics or real time recommendations, this
data is not huge currently its 10 GB per day.

Another alternatiove will be either Hbase or Cassandra, which one will be
better?

Any suggestions?


Also for this use cases should I use any existing big data platform like
hortonworks or I can deploy standalone spark cluster ?

Reply via email to