I definitely sympathize with the desire to have a graph database part of
the architecture, but I concur with Ali; the reputations for scalable graph
databases aren't the best.  I have resisted in pushing it so far because of
the concern about stability of an implementation.  I think we should tread
very carefully and really consider carefully if we need a full graph
database and whether the usecases justify introducing something that has
very unknown stability and performance.


On Wed, May 24, 2017 at 11:05 PM, Ali Nazemian <alinazem...@gmail.com>
wrote:

> Agreed on having a separate discussion/proposal. Having a graph database
> from the design perspective is one thing and having a stable and
> high-performance implementation of it is another thing. I have used
> different graph databases for multiple projects so far. It is very good on
> paper, but we should be careful about the implementation.
>
> The good point about using Titan for this purpose is it comes with a native
> ThinkerPop implementation that will be helpful in OLAP using Spark directly
> that we can use them out of the box. However, there were lots of issues
> regarding the stability of Titan (we were working on making that stable for
> 8 months!). I am not sure they have been fixed or not as a part of
> JanusGraph. I know Atlas team members are involved in JanusGraph
> development. The fact that they are using HBase as a backend would also be
> helpful, so we may need to share the conversation with them and use some of
> their experiences.
>
> Anyway, I was wondering anybody has done anything regarding this or not so
> I need to be aligned with that work and avoid any re-work.
>
> Cheers,
> Ali
>
> On Thu, May 25, 2017 at 4:21 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > We should have a discussion or a proposal on what should go in the graph
> > vs. what should go
> > in other stores.
> >
> >
> > On May 24, 2017 at 14:09:59, zeo...@gmail.com (zeo...@gmail.com) wrote:
> >
> > I would be very interested in a graph db that could leverage the
> > ip_src_addr and ip_dst_addr fields in a broad sense (who is talking to
> who,
> > visualize top talkers, etc.). In order to be very useful it would need to
> > have the ability to apply filters (IPs, ports, connection durations,
> bytes
> > transferred, etc.) and to narrow down certain time-based windows. I
> > probably have an environment where I could test this at semi-scale (a
> > couple billion messages per day) and flesh out some of the performance
> > concerns if this turns into something. Even if it was very early in
> > development, as I frequently rebuild that environment from scratch for
> > testing things.
> >
> > Jon
> >
> > On Wed, May 24, 2017 at 12:46 PM Nick Allen <n...@nickallen.org> wrote:
> >
> > > I think the addition of a graph capability would be very powerful. I
> know
> > > many who would love the idea, but I know of no implementations that
> have
> > > occurred.
> > >
> > > It might be good to discuss in the community specific use cases that
> > would
> > > be enabled by a graph database. That might help to flesh out the
> > technical
> > > aspects of it.
> > >
> > >
> > >
> > >
> > >
> > > On Wed, May 24, 2017 at 10:08 AM, Ali Nazemian <alinazem...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > We are going to design and develop an asset database for Metron. For
> > this
> > > > purpose, I have been thinking of a graph schema model to map assets
> as
> > > > Nodes and provide relations as Edges. This can be extended to event
> > level
> > > > to have a particular relation to assets as well as an event to event
> > > > relation. Regarding technology, I was thinking of using Titan Graph
> > > > Database (probably JanusGraph) and using HBase and Elasticsearch/Solr
> > as
> > > > backends. However, there might be a performance issue regarding this
> > > > decision if we want to use lots of Composite Indices. The problem we
> > will
> > > > be facing would be the fact that Titan creates separate column family
> > for
> > > > each Composite Index which HBase is not very good for it. Basically,
> it
> > > > would be better to use Cassandra for this purpose.
> > > >
> > > > I would like to understand what work have been done already regarding
> > > this
> > > > problem and what the roadmap will be, so I can make sure we will
> follow
> > > the
> > > > same strategy.
> > > >
> > > > Regards,
> > > > Ali
> > > >
> > >
> > --
> >
> > Jon
> >
>
>
>
> --
> A.Nazemian
>

Reply via email to