Hi all,
I have a website where users can refer other users, who in turn can refer
other users. This can be a fairly large tree/graph structure (as deep as
1000 or more). We keep track of each users revenue they have earned us
(from ads). I'm trying to calculate a user's worth by finding out all
the revenue they brought us via referrals, referrals of referrals, etc, as
without that user all the rest of the users in his graph potentially would
not have existed.
I've done this using Nested Sets using MySQL, but this was proving
challenging. The issue's I had was it was (relatively) faster creating the
tree from scratch each time -- around 2 hours. But when adding to the tree
new users from say yesterday, it takes much longer (more than the 2 hours
to build it from scratch again). The reason for this is for each insertion
it would do a rgt=rgt+2 where rgt>referrer's rgt (same with lft), which
meant changing the 1000+ records already in the database. It was painfully
slow. When building it from scratch, we simply iterate over each users
friends+friends of friends etc, that way it was really only changing the
top end of the database records and therefor is much faster than a new
insert at an earlier data point.
Anyway, it works, however the timing keeps going up and up, 2 hours becomes
3 hours now, etc, and it's a lot of data.
Anyway I've been playing around with switching this to Neo4j, but am having
a lot of trouble, need some guidance. I've assigned userId 1 as my system
user -- so if we wanted to add up the entire systems revenue we could start
the graph from 1.
I'm using batch-import, here's an example of my nodes.csv (note I've
changed some of it these are not actual values):
userId:int:userIds referUserId:int:referUserIds eventDate:string
revenue:float lastTransactionTime:string
1 2014-01-25 0.00
115 1 2014-01-25 8.31
122 122 2014-01-25 2.45
123 1 2014-01-25 1.25
132 115 2014-01-25 7.53
133 115 2014-01-25 3.39
134 133 2014-01-25 10.69
135 134 2014-01-25 1.00
136 134 2014-01-25 0.69
137 134 2014-01-25 0.39
138 137 2014-01-25 1.29
139 137 2014-01-25 1.19
140 137 2014-01-25 1.09
Here's an example of my rels.csv:
userId:int:userIds userId:int:userIds type
1 115 referred
115 122 referred
1 123 referred
122 132 referred
122 133 referred
133 134 referred
134 135 referred
134 136 referred
134 137 referred
137 138 referred
137 139 referred
137 140 referred
Anyway two questions I have ---
1) Which way should the relationship be going? From 115 to 1 (115 was
referred_by 1) or from 1 to 115 (1 referred 115) or should I have both
relationships?
2) Should I start from userId 1 as system? Is that what's causing it to
loop?
Here's a query I've tried without success:
START user=node:userIds(userId='115')
MATCH (user)-[:referred*]-(friend)
WITH sum(friend.revenue) as revenues
RETURN revenues
It returns 30.96 .. However if you add up referrer and referrers of
referrers of 115, it should add up to 29.71. It's as if it's adding all of
them, even 123 with a referral of 1 which seems to be the 1.25 difference.
I've tried :referred*..2 but that only returns 2 levels of data
(14.62).
Anyway I need some help here.
Thanks,
Steve
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.