I'm working on an application with a Neo4j back end (current version 
2.0.0-RC1, but we will likely update it periodically to stay current), and 
am debating how to handle usage logging.

The situation: We want to keep track of who viewed what nodes, and how many 
times, in order to do things like "Most Viewed" and "Users Who Viewed This 
Also Viewed That" and so forth. The naive way to do this is to have a node 
for each user and create a HAS_VIEWED relationship each time the user views 
another node, with all the data you'd usually log; date, time, et cetera. 
However, I'm concerned about the possible performance hit. A popular 
node--one that gets linked from our home page, say--could well end up with 
millions of HAS_VIEWED relationships, albeit most of them would come from 
the "Anonymous" user. How is that likely to affect performance on queries? 
If I want to do something like calculate the total number of views on a 
node within the last 2 weeks, is that going to cause problems?

The other option I'm considering is to use the graph database to store a 
single HAS_VIEWED relationship between a user and a node, with a few bits 
of summary data (last date viewed and total number of views, say) and use a 
relational database to keep track of the individual visits. This has the 
advantage of maintaining the essential relationship information in the 
graph, while using the relational database for what it's good at: Managing 
large collections of identically formatted records that need to be 
aggregated, sliced, and diced. However, this has some drawbacks, most 
particularly that we will have to find ways to update the summary data as 
it changes over time.

Any thoughts? What approach would you take?

Thanks!

Evan

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to