I'll decline to continue the commentary on spark, as again this probably
belongs on another list, other than to say, microbatches is an intentional
design tradeoff that has notable benefits for the same use cases you're
referring too, and that while you may disagree with those tradeoffs, it's a
bit harsh to dismiss as "basic" something that was chosen and provides some
improvements over say..the Storm model.

On Thu, Dec 18, 2014 at 7:13 AM, Peter Lin <wool...@gmail.com> wrote:
>
>
> some of the most common types of use cases in stream processing is sliding
> windows based on time or count. Based on my understanding of spark
> architecture and spark streaming, it does not provide the same
> functionality. One can fake it by setting spark streaming to really small
> micro-batches, but that's not the same.
>
> if the use case fits that model, than using spark is fine. For other kinds
> of use cases, spark may not be a good fit. Some people store all events
> before analyzing it, which works for some use cases. While other uses cases
> like trading systems, store before analysis isn't feasible or practical.
> Other use cases like command control also don't fit store before analysis
> model.
>
> Try to avoid putting the cart infront of the horse. Picking a tool before
> you have a clear understanding of the problem is a good recipe for disaster
>
> On Thu, Dec 18, 2014 at 8:04 AM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>
>> Since Ajay is already using spark the Spark Cassandra Connector really
>> gets them where they want to be pretty easily
>> https://github.com/datastax/spark-cassandra-connector (joins, etc).
>>
>> As far as spark streaming having "basic support" I'd challenge that
>> assertion (namely Storm has a number of problems with delivery guarantees
>> that Spark basically solves), however, this isn't a Spark mailing list, and
>> perhaps this conversation is better had there.
>>
>> If the question "Is Cassandra used in real time analytics cases with
>> Spark?" the answer is absolutely yes (and Storm for that matter). If the
>> question is "Can you do your analytics queries on Cassandra while you have
>> Spark sitting there doing nothing?" then of course the answer is no, but
>> that'd be a bizzare question, they already have Spark in use.
>>
>> On Thu, Dec 18, 2014 at 6:52 AM, Peter Lin <wool...@gmail.com> wrote:
>>>
>>> that depends on what you mean by real-time analytics.
>>>
>>> For things like continuous data streams, neither are appropriate
>>> platforms for doing analytics. They're good for storing the results (aka
>>> output) of the streaming analytics. I would suggest before you decide
>>> cassandra vs hbase, first figure out exactly what kind of analytics you
>>> need to do. Start with prototyping and look at what kind of queries and
>>> patterns you need to support.
>>>
>>> neither hbase or cassandra are good for complex patterns that do joins
>>> or cross joins (aka mdx), so using either one you have to re-invent stuff.
>>>
>>> most of the event processing and stream processing products out there
>>> also don't support joins or cross joins very well, so any solution is going
>>> to need several different components. typically stream processing does
>>> filtering, which feeds another system that does simple joins. The output of
>>> the second step can then go to another system that does mdx style queries.
>>>
>>> spark streaming has basic support, but it's not as mature and feature
>>> rich as other stream processing products.
>>>
>>> On Wed, Dec 17, 2014 at 11:20 PM, Ajay <ajay.ga...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Can Cassandra be used or best fit for Real Time Analytics? I went
>>>> through couple of benchmark between Cassandra Vs HBase (most of it was done
>>>> 3 years ago) and it mentioned that Cassandra is designed for intensive
>>>> writes and Cassandra has higher latency for reads than HBase. In our case,
>>>> we will have writes and reads (but reads will be more say 40% writes and
>>>> 60% reads). We are planning to use Spark as the in memory computation
>>>> engine.
>>>>
>>>> Thanks
>>>> Ajay
>>>>
>>>
>>
>> --
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Ryan Svihla
>>
>> Solution Architect
>>
>> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>>

-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Reply via email to