Re: Re: OLAP query using spark dataframe with cassandra

[email protected] Mon, 09 Nov 2015 18:02:57 -0800

Hi,

According to my experience, I would recommend option 3) using Apache Kylin for 
your requirements.


This is a suggestion based on the open-source world. 

For the per cassandra thing, I accept your advice for the special support 
thing. But the community is very

open and convinient for prompt response. 



[email protected]
 
From: tsh
Date: 2015-11-10 02:56
To: [email protected]; user; dev
Subject: Re: OLAP query using spark dataframe with cassandra
Hi,

I'm in the same position right now: we are going to implement something like 
OLAP BI + Machine Learning explorations on the same cluster.
Well, the question is quite ambivalent: from one hand, we have terabytes of 
versatile data and the necessity to make something like cubes (Hive and Hive on 
HBase are unsatisfactory). From the other, our users get accustomed to Tableau 
+ Vertica. 
So, right now I consider the following choices:
1) Platfora (not free, I don't know price right now) + Spark
2) AtScale + Tableau(not free, I don't know price right now) + Spark
3) Apache Kylin (young project?) + Spark on YARN + Kafka + Flume + some storage
4) Apache Phoenix + Apache HBase + Mondrian + Spark on YARN + Kafka + Flume 
(has somebody use it in production?)
5) Spark + Tableau  (cubes?)

For myself, I decided not to dive into Mesos. Cassandra is hardly configurable, 
you'll have to dedicate special employee to support it.

I'll be glad to hear other ideas & propositions as we are at the beginning of 
the process too.

Sincerely yours, Tim Shenkao

On 11/09/2015 09:46 AM, [email protected] wrote:
Hi, 

Thanks for suggesting. Actually we are now evaluating and stressing the spark 
sql on cassandra, while

trying to define business models. FWIW, the solution mentioned here is 
different from traditional OLAP

cube engine, right ? So we are hesitating on the common sense or direction 
choice of olap architecture. 

And we are happy to hear more use case from this community. 

Best,
Sun. 



[email protected]
 
From: Jörn Franke
Date: 2015-11-09 14:40
To: [email protected]
CC: user; dev
Subject: Re: OLAP query using spark dataframe with cassandra

Is there any distributor supporting these software components in combination? 
If no and your core business is not software then you may want to look for 
something else, because it might not make sense to build up internal know-how 
in all of these areas.

In any case - it depends all highly on your data and queries. You will have to 
do your own experiments.

On 09 Nov 2015, at 07:02, "[email protected]" <[email protected]> wrote:

Hi, community

We are specially interested about this featural integration according to some 
slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka)

seems good implementation for lambda architecure in the open-source world, 
especially non-hadoop based cluster environment. As we can see, 

the advantages obviously consist of :

1 the feasibility and scalability of spark datafram api, which can also make a 
perfect complement for Apache Cassandra native cql feature.

2 both streaming and batch process availability using the ALL-STACK thing, cool.

3 we can both achieve compacity and usability for spark with cassandra, 
including seemlessly integrating with job scheduling and resource management.

Only one concern goes to the OLAP query performance issue, which mainly caused 
by frequent aggregation work between daily increased large tables, for 

both spark sql and cassandra. I can see that the [1] use case facilitates 
FiloDB to achieve columnar storage and query performance, but we had nothing 
more 

knowledge. 

Question is : Any guy had such use case for now, especially using in your 
production environment ? Would be interested in your architeture for designing 
this 

OLAP engine using spark +  cassandra. What do you think the comparison between 
the scenario with traditional OLAP cube design? Like Apache Kylin or 

pentaho mondrian ? 

Best Regards,

Sun.


[1]  
http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark



[email protected]

Re: Re: OLAP query using spark dataframe with cassandra

Reply via email to