Re: Apache Spark and Graphx for Real Time Analytics

Nick Pentreath Tue, 08 Apr 2014 13:24:06 -0700

GraphX, like Spark, will not typically be "real-time" (where by "real-time"
here I assume you mean of the order of a few 10s-100s ms, up to a few
seconds).

Spark can in some cases approach the upper boundary of this definition (a
second or two, possibly less) when data is cached in memory and the
computation is not "too heavy", while Spark Streaming may be able to get
closer to the mid-to-upper boundary of this under similar conditions,
especially if aggregating over relatively small windows.

However, for this use case (while I haven't used GraphX yet) I would say
something like Titan (https://github.com/thinkaurelius/titan/wiki) or a
similar OLTP graph DB may be what you're after. But this depends on what
kind of graph traversal you need.

On Tue, Apr 8, 2014 at 10:02 PM, love2dishtech <love2disht...@gmail.com>wrote:

> Hi,
>
> Is Graphx on top of Apache Spark, is able to process the large scale
> distributed graph traversal and compute, in real time. What is the query
> execution engine distributing the query on top of graphx and apache spark.
> My typical use case is a large scale distributed graph traversal in real
> time, with billions of nodes.
>
> Thanks,
> Love.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-and-Graphx-for-Real-Time-Analytics-tp6261.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>

Re: Apache Spark and Graphx for Real Time Analytics

Reply via email to