Re: Re: Need help in setting up spark cluster

2015-07-23 Thread fightf...@163.com
Hi, there

Per for your analytical and real time recommendations request, I would 
recommend you use spark sql and hive thriftserver 

to store and process your spark streaming data. As thriftserver would be run as 
a long-term application and it would be 

quite feasible to cyclely comsume data and provide some analytical 
requitements. 

On the other hand, hbase or cassandra would also be sufficient and I think you 
may want to integrate spark sql with hbase / cassandra

for your data digesting.  You could deploy a CDH or HDP platform to support 
your productive environment running. I suggest you 

firstly to deploy a spark standalone cluster to run some integration tests, and 
also you can consider running spark on yarn for 

the later development use cases. 

Best,
Sun.



fightf...@163.com
 
From: Jeetendra Gangele
Date: 2015-07-23 13:39
To: user
Subject: Re: Need help in setting up spark cluster
Can anybody help here?

On 22 July 2015 at 10:38, Jeetendra Gangele gangele...@gmail.com wrote:
Hi All, 

I am trying to capture the user activities for real estate portal.

I am using RabbitMS and Spark streaming combination where all the Events I am 
pushing to RabbitMQ and then 1 secs micro job I am consuming using Spark 
streaming.

Later on I am thinking to store the consumed data for analytics or near real 
time recommendations.

Where should I store this data in Spark RDD itself and using SparkSQL people 
can query this data for analytics or real time recommendations, this data is 
not huge currently its 10 GB per day.

Another alternatiove will be either Hbase or Cassandra, which one will be 
better?

Any suggestions?


Also for this use cases should I use any existing big data platform like 
hortonworks or I can deploy standalone spark cluster ? 






Re: Re: Need help in setting up spark cluster

2015-07-23 Thread Jeetendra Gangele
Thanks for reply and your valuable suggestions

I have 10 GB data generated every day so this data I need to write in my
database also this data is schema base and schema changes frequently , so
consider this as unstructured data sometimes I may have to serve 1
write/secs with 4 m1.xLarge machine so using spark SQL with hive thrift
server will be good enough?
As per my understanding spark Sql works on schemaRDD will there not be any
problem when schema changes?

Also I have complex queries for real time analytics something like AND
queries involved multiple field queries like

list all user who bought flats in mumbai in last 30 minutes


if I use Hbase/Cassandra i need to set up the NOSQL cluster so now two
cluster one for spark and another one for NOSQl,so its not better to start
with HDP?




On 23 July 2015 at 11:33, fightf...@163.com fightf...@163.com wrote:

 Hi, there

 Per for your analytical and real time recommendations request, I would
 recommend you use spark sql and hive thriftserver

 to store and process your spark streaming data. As thriftserver would be
 run as a long-term application and it would be

 quite feasible to cyclely comsume data and provide some analytical
 requitements.

 On the other hand, hbase or cassandra would also be sufficient and I think
 you may want to integrate spark sql with hbase / cassandra

 for your data digesting.  You could deploy a CDH or HDP platform to
 support your productive environment running. I suggest you

 firstly to deploy a spark standalone cluster to run some integration
 tests, and also you can consider running spark on yarn for

 the later development use cases.

 Best,
 Sun.

 --
 fightf...@163.com


 *From:* Jeetendra Gangele gangele...@gmail.com
 *Date:* 2015-07-23 13:39
 *To:* user user@spark.apache.org
 *Subject:* Re: Need help in setting up spark cluster
 Can anybody help here?

 On 22 July 2015 at 10:38, Jeetendra Gangele gangele...@gmail.com wrote:

 Hi All,

 I am trying to capture the user activities for real estate portal.

 I am using RabbitMS and Spark streaming combination where all the Events
 I am pushing to RabbitMQ and then 1 secs micro job I am consuming using
 Spark streaming.

 Later on I am thinking to store the consumed data for analytics or near
 real time recommendations.

 Where should I store this data in Spark RDD itself and using SparkSQL
 people can query this data for analytics or real time recommendations, this
 data is not huge currently its 10 GB per day.

 Another alternatiove will be either Hbase or Cassandra, which one will be
 better?

 Any suggestions?


 Also for this use cases should I use any existing big data platform like
 hortonworks or I can deploy standalone spark cluster ?







Re: Need help in setting up spark cluster

2015-07-22 Thread Jeetendra Gangele
Can anybody help here?

On 22 July 2015 at 10:38, Jeetendra Gangele gangele...@gmail.com wrote:

 Hi All,

 I am trying to capture the user activities for real estate portal.

 I am using RabbitMS and Spark streaming combination where all the Events I
 am pushing to RabbitMQ and then 1 secs micro job I am consuming using Spark
 streaming.

 Later on I am thinking to store the consumed data for analytics or near
 real time recommendations.

 Where should I store this data in Spark RDD itself and using SparkSQL
 people can query this data for analytics or real time recommendations, this
 data is not huge currently its 10 GB per day.

 Another alternatiove will be either Hbase or Cassandra, which one will be
 better?

 Any suggestions?


 Also for this use cases should I use any existing big data platform like
 hortonworks or I can deploy standalone spark cluster ?