> The moon is pretty ominous looking in Hamburg tonight, which means > that it's time for "contrary Ethan" to come out of hiding.
Hi there, "contrary Ethan" ;-) > Really all I want to do is get anyone who might implement this to > think twice about using tracking to do this. In fact, we probably need > to take a really hard look at the tracking approach in general. Unless > I'm mistaken, every message created is distributed to every user actor > in the system to check if it matches a tracker for that user. This is correct. So let's try to take a really hard look into the current implementation and the alternative. > Can anyone else verify that? If this is how it's really set up, then > it's going to become a major problem in systems with large numbers of > users and I'll need to open a Jira item for it and we'll figure out > what to do. I'm not sure how to do it better right now, but if that's > how it works then it's an issue. If it's an issue, then it seems the only way to solve this would be to disable tracking completely, just like Twitter did back in the days when there was tracking via IM. Let me also note that tracking is way more powerful than just search for a hashtag. There is a limited way of hashtags you can find in a message, but in practice there are an unlimited number of matchers you can construct on a message, because they use the same filters as actions. This means that it's harder to have optimization shortcuts for finding which existing tracking matchers match. > With regards to hashtag following, I think the right way to do this is > to set up another actor like a User actor or (to be created) > Conversation actor and when someone follows a hashtag the actor will > put the message on that user's timeline. > > Note that these conversation and hashtag actors only need to be > started up when someone follows a conversation or hashtag, and there > would be only one actor object per conversation/hashtag as opposed to > one per user following (as in the case of tracking). This approach is > also pretty efficient because we can look at a message and know > exactly which conversation or hashtag actor we need to forward it to > without querying 1000s of actors to see if they are interested in it. Hm, I'm not sure how this would be more efficient. Instead of having an actor per user which checks the track matchers per user, you'll have an actor per track which checks the users for each track matcher. The subtle problem here is with having unique track matchers as opposed to unique users. When you have the UserActor-first approach, you might have duplicate matches. In other words, you might scan for the same tracking match again for another user. However, when you find even one matcher that matches, you stop matching and append the message to the user's timeline. In contrast, when you go with the TrackingActor-first approach, you might have duplicate user matches. This means that one message might match a user multiple times for different tracking matches. Then in order to ensure uniqueness, one approach is that the mailbox must be scanned every time to see if the message is not already there. Another approach would be to have a set of users to send a message to, in which you guarantee that a user is only included once- this is identical in complexity to sorting the list of users to send to. Another point to have in mind is: in a typical scenario, do you expect users or hashtags to have a higher number? Finally, having hashtags/conversation actors will not eliminate the need for user actors, so they will be just added to the total actor count. There is no getting around the fact that if we want tracking, each and every message has to be scanned for each and every user's track criteria. Whether we invert the order doesn't reduce the overall complexity much. Changing the current implementation might help if there is an order of magnitude more users than track searches. However, since a user can have many tracked searches, it seems that at least the worst-case scenario will create many more actors and messages sent for the model you're suggesting. I'm not sure I explained this clearly enough, I can give more specific examples and/or diagrams. Eventually it's worth to simulate a model and see which one would scale in practice. Vassil
