Hi,
I'm in the same position right now: we are going to implement something
like OLAP BI + Machine Learning explorations on the same cluster.
Well, the question is quite ambivalent: from one hand, we have terabytes
of versatile data and the necessity to make something like cubes (Hive
and Hive on HBase are unsatisfactory). From the other, our users get
accustomed to Tableau + Vertica.
So, right now I consider the following choices:
1) Platfora (not free, I don't know price right now) + Spark
2) AtScale + Tableau(not free, I don't know price right now) + Spark
3) Apache Kylin (young project?) + Spark on YARN + Kafka + Flume + some
storage
4) Apache Phoenix + Apache HBase + Mondrian + Spark on YARN + Kafka +
Flume (has somebody use it in production?)
5) Spark + Tableau (cubes?)
For myself, I decided not to dive into Mesos. Cassandra is hardly
configurable, you'll have to dedicate special employee to support it.
I'll be glad to hear other ideas & propositions as we are at the
beginning of the process too.
Sincerely yours, Tim Shenkao
On 11/09/2015 09:46 AM, fightf...@163.com wrote:
Hi,
Thanks for suggesting. Actually we are now evaluating and stressing
the spark sql on cassandra, while
trying to define business models. FWIW, the solution mentioned here is
different from traditional OLAP
cube engine, right ? So we are hesitating on the common sense or
direction choice of olap architecture.
And we are happy to hear more use case from this community.
Best,
Sun.
------------------------------------------------------------------------
fightf...@163.com
*From:* Jörn Franke <mailto:jornfra...@gmail.com>
*Date:* 2015-11-09 14:40
*To:* fightf...@163.com <mailto:fightf...@163.com>
*CC:* user <mailto:u...@spark.apache.org>; dev
<mailto:dev@spark.apache.org>
*Subject:* Re: OLAP query using spark dataframe with cassandra
Is there any distributor supporting these software components in
combination? If no and your core business is not software then you
may want to look for something else, because it might not make
sense to build up internal know-how in all of these areas.
In any case - it depends all highly on your data and queries. You
will have to do your own experiments.
On 09 Nov 2015, at 07:02, "fightf...@163.com
<mailto:fightf...@163.com>" <fightf...@163.com
<mailto:fightf...@163.com>> wrote:
Hi, community
We are specially interested about this featural integration
according to some slides from [1]. The
SMACK(Spark+Mesos+Akka+Cassandra+Kafka)
seems good implementation for lambda architecure in the
open-source world, especially non-hadoop based cluster
environment. As we can see,
the advantages obviously consist of :
1 the feasibility and scalability of spark datafram api, which
can also make a perfect complement for Apache Cassandra native
cql feature.
2 both streaming and batch process availability using the
ALL-STACK thing, cool.
3 we can both achieve compacity and usability for spark with
cassandra, including seemlessly integrating with job scheduling
and resource management.
Only one concern goes to the OLAP query performance issue, which
mainly caused by frequent aggregation work between daily
increased large tables, for
both spark sql and cassandra. I can see that the [1] use case
facilitates FiloDB to achieve columnar storage and query
performance, but we had nothing more
knowledge.
Question is : Any guy had such use case for now, especially using
in your production environment ? Would be interested in your
architeture for designing this
OLAP engine using spark + cassandra. What do you think the
comparison between the scenario with traditional OLAP cube
design? Like Apache Kylin or
pentaho mondrian ?
Best Regards,
Sun.
[1]
http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark
------------------------------------------------------------------------
fightf...@163.com <mailto:fightf...@163.com>