A node with millions of relationships is going to slow down any traversal that will come across it. In 2.1 it'll only slow down traversals of the particular type and direction of relationship, which may or may not improve ones situation depending on the queries you want to run.
"Users Who Viewed This Also Viewed That" is a recommendation query, whereas "Total Views in Time Period" is an OLAP type query. I would use event sourcing and event processing to compute both, and store the result - along with the sequence id of the most recent event the results are based upon - in whatever store is most convenient for querying. The recommendations could very well be stored in a graph database, where products could have relationships to other recommended products, for instance. OLAP type data could be stored in a relational database (this can be made to work pretty well if you know how to optimise) or a special purpose OLAP database - something that can do range scans over the time component. Of course you could also store this as summary nodes, or something like that, in a graph database. It depends on what your needs are. And should you change your mind, then the event sourcing approach will allow you to populate any database you might need in the future. Event sourcing (along with that saved event sequence id I mentioned earlier) also solves your worry about keeping summary data updated - just apply all events that are newer than what the summary data is based upon, at regular intervals. -- Chris Vest System Engineer, Neo Technology [ skype: mr.chrisvest, twitter: chvest ] On 11 Mar 2014, at 21:16, Evan Grantham-Brown <[email protected]> wrote: > I'm working on an application with a Neo4j back end (current version > 2.0.0-RC1, but we will likely update it periodically to stay current), and am > debating how to handle usage logging. > > The situation: We want to keep track of who viewed what nodes, and how many > times, in order to do things like "Most Viewed" and "Users Who Viewed This > Also Viewed That" and so forth. The naive way to do this is to have a node > for each user and create a HAS_VIEWED relationship each time the user views > another node, with all the data you'd usually log; date, time, et cetera. > However, I'm concerned about the possible performance hit. A popular > node--one that gets linked from our home page, say--could well end up with > millions of HAS_VIEWED relationships, albeit most of them would come from the > "Anonymous" user. How is that likely to affect performance on queries? If I > want to do something like calculate the total number of views on a node > within the last 2 weeks, is that going to cause problems? > > The other option I'm considering is to use the graph database to store a > single HAS_VIEWED relationship between a user and a node, with a few bits of > summary data (last date viewed and total number of views, say) and use a > relational database to keep track of the individual visits. This has the > advantage of maintaining the essential relationship information in the graph, > while using the relational database for what it's good at: Managing large > collections of identically formatted records that need to be aggregated, > sliced, and diced. However, this has some drawbacks, most particularly that > we will have to find ways to update the summary data as it changes over time. > > Any thoughts? What approach would you take? > > Thanks! > > Evan > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
