I definitely sympathize with the desire to have a graph database part of the architecture, but I concur with Ali; the reputations for scalable graph databases aren't the best. I have resisted in pushing it so far because of the concern about stability of an implementation. I think we should tread very carefully and really consider carefully if we need a full graph database and whether the usecases justify introducing something that has very unknown stability and performance.
On Wed, May 24, 2017 at 11:05 PM, Ali Nazemian <alinazem...@gmail.com> wrote: > Agreed on having a separate discussion/proposal. Having a graph database > from the design perspective is one thing and having a stable and > high-performance implementation of it is another thing. I have used > different graph databases for multiple projects so far. It is very good on > paper, but we should be careful about the implementation. > > The good point about using Titan for this purpose is it comes with a native > ThinkerPop implementation that will be helpful in OLAP using Spark directly > that we can use them out of the box. However, there were lots of issues > regarding the stability of Titan (we were working on making that stable for > 8 months!). I am not sure they have been fixed or not as a part of > JanusGraph. I know Atlas team members are involved in JanusGraph > development. The fact that they are using HBase as a backend would also be > helpful, so we may need to share the conversation with them and use some of > their experiences. > > Anyway, I was wondering anybody has done anything regarding this or not so > I need to be aligned with that work and avoid any re-work. > > Cheers, > Ali > > On Thu, May 25, 2017 at 4:21 AM, Otto Fowler <ottobackwa...@gmail.com> > wrote: > > > We should have a discussion or a proposal on what should go in the graph > > vs. what should go > > in other stores. > > > > > > On May 24, 2017 at 14:09:59, zeo...@gmail.com (zeo...@gmail.com) wrote: > > > > I would be very interested in a graph db that could leverage the > > ip_src_addr and ip_dst_addr fields in a broad sense (who is talking to > who, > > visualize top talkers, etc.). In order to be very useful it would need to > > have the ability to apply filters (IPs, ports, connection durations, > bytes > > transferred, etc.) and to narrow down certain time-based windows. I > > probably have an environment where I could test this at semi-scale (a > > couple billion messages per day) and flesh out some of the performance > > concerns if this turns into something. Even if it was very early in > > development, as I frequently rebuild that environment from scratch for > > testing things. > > > > Jon > > > > On Wed, May 24, 2017 at 12:46 PM Nick Allen <n...@nickallen.org> wrote: > > > > > I think the addition of a graph capability would be very powerful. I > know > > > many who would love the idea, but I know of no implementations that > have > > > occurred. > > > > > > It might be good to discuss in the community specific use cases that > > would > > > be enabled by a graph database. That might help to flesh out the > > technical > > > aspects of it. > > > > > > > > > > > > > > > > > > On Wed, May 24, 2017 at 10:08 AM, Ali Nazemian <alinazem...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > We are going to design and develop an asset database for Metron. For > > this > > > > purpose, I have been thinking of a graph schema model to map assets > as > > > > Nodes and provide relations as Edges. This can be extended to event > > level > > > > to have a particular relation to assets as well as an event to event > > > > relation. Regarding technology, I was thinking of using Titan Graph > > > > Database (probably JanusGraph) and using HBase and Elasticsearch/Solr > > as > > > > backends. However, there might be a performance issue regarding this > > > > decision if we want to use lots of Composite Indices. The problem we > > will > > > > be facing would be the fact that Titan creates separate column family > > for > > > > each Composite Index which HBase is not very good for it. Basically, > it > > > > would be better to use Cassandra for this purpose. > > > > > > > > I would like to understand what work have been done already regarding > > > this > > > > problem and what the roadmap will be, so I can make sure we will > follow > > > the > > > > same strategy. > > > > > > > > Regards, > > > > Ali > > > > > > > > > -- > > > > Jon > > > > > > -- > A.Nazemian >