Re: OLAP query using spark dataframe with cassandra

tsh Mon, 09 Nov 2015 10:57:19 -0800

Hi,

I'm in the same position right now: we are going to implement somethinglike OLAP BI + Machine Learning explorations on the same cluster.Well, the question is quite ambivalent: from one hand, we have terabytesof versatile data and the necessity to make something like cubes (Hiveand Hive on HBase are unsatisfactory). From the other, our users getaccustomed to Tableau + Vertica.

So, right now I consider the following choices:
1) Platfora (not free, I don't know price right now) + Spark
2) AtScale + Tableau(not free, I don't know price right now) + Spark

3) Apache Kylin (young project?) + Spark on YARN + Kafka + Flume + somestorage4) Apache Phoenix + Apache HBase + Mondrian + Spark on YARN + Kafka +Flume (has somebody use it in production?)

5) Spark + Tableau  (cubes?)

For myself, I decided not to dive into Mesos. Cassandra is hardlyconfigurable, you'll have to dedicate special employee to support it.

I'll be glad to hear other ideas & propositions as we are at thebeginning of the process too.


Sincerely yours, Tim Shenkao

On 11/09/2015 09:46 AM, [email protected] wrote:

Hi,

Thanks for suggesting. Actually we are now evaluating and stressingthe spark sql on cassandra, while

trying to define business models. FWIW, the solution mentioned here isdifferent from traditional OLAP

cube engine, right ? So we are hesitating on the common sense ordirection choice of olap architecture.


And we are happy to hear more use case from this community.

Best,
Sun.

------------------------------------------------------------------------
[email protected]

    *From:* Jörn Franke <mailto:[email protected]>
    *Date:* 2015-11-09 14:40
    *To:* [email protected] <mailto:[email protected]>
    *CC:* user <mailto:[email protected]>; dev
    <mailto:[email protected]>
    *Subject:* Re: OLAP query using spark dataframe with cassandra

    Is there any distributor supporting these software components in
    combination? If no and your core business is not software then you
    may want to look for something else, because it might not make
    sense to build up internal know-how in all of these areas.

    In any case - it depends all highly on your data and queries. You
    will have to do your own experiments.

    On 09 Nov 2015, at 07:02, "[email protected]
    <mailto:[email protected]>" <[email protected]
    <mailto:[email protected]>> wrote:

    Hi, community

    We are specially interested about this featural integration
    according to some slides from [1]. The
    SMACK(Spark+Mesos+Akka+Cassandra+Kafka)

    seems good implementation for lambda architecure in the
    open-source world, especially non-hadoop based cluster
    environment. As we can see,

    the advantages obviously consist of :

    1 the feasibility and scalability of spark datafram api, which
    can also make a perfect complement for Apache Cassandra native
    cql feature.

    2 both streaming and batch process availability using the
    ALL-STACK thing, cool.

    3 we can both achieve compacity and usability for spark with
    cassandra, including seemlessly integrating with job scheduling
    and resource management.

    Only one concern goes to the OLAP query performance issue, which
    mainly caused by frequent aggregation work between daily
    increased large tables, for

    both spark sql and cassandra. I can see that the [1] use case
    facilitates FiloDB to achieve columnar storage and query
    performance, but we had nothing more

    knowledge.

    Question is : Any guy had such use case for now, especially using
    in your production environment ? Would be interested in your
    architeture for designing this

    OLAP engine using spark +  cassandra. What do you think the
    comparison between the scenario with traditional OLAP cube
    design? Like Apache Kylin or

    pentaho mondrian ?

    Best Regards,

    Sun.


    [1]
    
http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark

    ------------------------------------------------------------------------
    [email protected] <mailto:[email protected]>

Re: OLAP query using spark dataframe with cassandra

Reply via email to