Agreed on having a separate discussion/proposal. Having a graph database from the design perspective is one thing and having a stable and high-performance implementation of it is another thing. I have used different graph databases for multiple projects so far. It is very good on paper, but we should be careful about the implementation.
The good point about using Titan for this purpose is it comes with a native ThinkerPop implementation that will be helpful in OLAP using Spark directly that we can use them out of the box. However, there were lots of issues regarding the stability of Titan (we were working on making that stable for 8 months!). I am not sure they have been fixed or not as a part of JanusGraph. I know Atlas team members are involved in JanusGraph development. The fact that they are using HBase as a backend would also be helpful, so we may need to share the conversation with them and use some of their experiences. Anyway, I was wondering anybody has done anything regarding this or not so I need to be aligned with that work and avoid any re-work. Cheers, Ali On Thu, May 25, 2017 at 4:21 AM, Otto Fowler <[email protected]> wrote: > We should have a discussion or a proposal on what should go in the graph > vs. what should go > in other stores. > > > On May 24, 2017 at 14:09:59, [email protected] ([email protected]) wrote: > > I would be very interested in a graph db that could leverage the > ip_src_addr and ip_dst_addr fields in a broad sense (who is talking to who, > visualize top talkers, etc.). In order to be very useful it would need to > have the ability to apply filters (IPs, ports, connection durations, bytes > transferred, etc.) and to narrow down certain time-based windows. I > probably have an environment where I could test this at semi-scale (a > couple billion messages per day) and flesh out some of the performance > concerns if this turns into something. Even if it was very early in > development, as I frequently rebuild that environment from scratch for > testing things. > > Jon > > On Wed, May 24, 2017 at 12:46 PM Nick Allen <[email protected]> wrote: > > > I think the addition of a graph capability would be very powerful. I know > > many who would love the idea, but I know of no implementations that have > > occurred. > > > > It might be good to discuss in the community specific use cases that > would > > be enabled by a graph database. That might help to flesh out the > technical > > aspects of it. > > > > > > > > > > > > On Wed, May 24, 2017 at 10:08 AM, Ali Nazemian <[email protected]> > > wrote: > > > > > Hi all, > > > > > > We are going to design and develop an asset database for Metron. For > this > > > purpose, I have been thinking of a graph schema model to map assets as > > > Nodes and provide relations as Edges. This can be extended to event > level > > > to have a particular relation to assets as well as an event to event > > > relation. Regarding technology, I was thinking of using Titan Graph > > > Database (probably JanusGraph) and using HBase and Elasticsearch/Solr > as > > > backends. However, there might be a performance issue regarding this > > > decision if we want to use lots of Composite Indices. The problem we > will > > > be facing would be the fact that Titan creates separate column family > for > > > each Composite Index which HBase is not very good for it. Basically, it > > > would be better to use Cassandra for this purpose. > > > > > > I would like to understand what work have been done already regarding > > this > > > problem and what the roadmap will be, so I can make sure we will follow > > the > > > same strategy. > > > > > > Regards, > > > Ali > > > > > > -- > > Jon > -- A.Nazemian
