So notifications, dashboards, walls, timelines - these things have been spoken about recently.
I'd love to see LP gain a really fantastic story in these areas, and I think the plumbing is about 1 weeks yak shaving + 1 weeks actual development if done as a service using django (or a leaner still framework like flask, web.py, webob or even just wsgiref) + postgresql. It seems to me that there are a few core questions like 'how much history do we support' and 'if someone leaves a team do notifications from that team disappear from their timeline' which can be answered pessimistically ('all history') without making the problem particularly hard to solve. Ah, but which problem you may be asking? Well, I humbly suggest that there are two common problems needed to implement customisable notifications, dashboards/walls, and timelines: - 'who is interested in a given event' - 'who was told about a given event' (Note that the tense is very deliberate here). Determining who gets sent mail abut a bug, merge proposal, branch, package upload etc is all a case of determining who is interested in that event. Determining whose timelines an event should turn up in is determining who /was/ told about it; Determining if an event is relevant to a project overview is also the same problem [structural subscriptions can be considered a multiplier of events relevant to the project overview]. While Cassandra or other NoSQL DB's offer massive scalability for per-user data structures, I don't think we need this to solve this core problem. Imagine the following API (a strawman, I don't claim its right :)): --------- notify(subject_template, body_template, summary_template, event_tags, topics, subject_tags, participants) """Notify subscribers about an event. here the three templates are hopefully obvious. event_tags is a set that would have things like 'bug', 'project', 'branch' topics is a set of SOA object ids(*) subject_tags is a set of tags on the object itself. E.g. a bugs tags would go in subject_tags participants is a list of subscribables which are direct participants in the event (and thus should be notified even if the data in LP doesn't show them as interested). """ subscribe(recipient, subscription_tags, event_tags, exclude_event_tags, topics, exclude_topics, subject_tags, exclude_subject_tags) """subscribe an object to an event The foo and exclude_foo sets provide filtering for the subscription - an entry in foo requires that entry; an entry in exclude_ will reject notification if the entry is matched. the subscription_tags set allows for the subscription to be categorised, For instance, the implicit subscription of a bug assignee to the bug would be represented by a subscription(bug object id, ['assignee'], None, None, None, None, None, None). """ subscriptions(object id): """Return the subscription ids that affect object id.""" unsubscribe(object id, subscription id) """Drops a subscription.""" events(subscribers, topics, batch_endpoint): """Return the events for subscribers, topics from batch_endpoint. :param subscribers: None or a list of subscribers. :param topics: None of a list of topics. If subscribers or topics are supplied, events are limited to those in common to both. """ ------------ I think this API would be sufficient to (efficiently): - replace structural subscriptions - replace bug subscriptions - replace package upload notifications - provide RSS feeds for per-user notifications - provide RSS feeds for object changes - provide per user timelines This API would call back to LP to perform subscription expansion and then structural subscriptions would be one per team in the service. (Or we could maintain an expanded cache in the API, but I don't think thats needed). Implementation wise, we need to determine what queries are needed. I'm framing this as a /temporal/ service - it depends on knowing the state of team expansions etc after they happen. Performance wise, a holy grail would be being able to deliver low-ms responses from memory, and <1s responses from disk. This depends on very high selectivity on queries. A fact table like the following: subscriber, event, date would trivially provide highly efficient queries to deliver a timeline (given a supporting event table with the summary, ... tag metadata etc). Similarly topic, event, date will deliver events relevant to a given object in LP (a bug, a project, a project group) very efficiently. We could either run with two separate fact tables, or one fact table with nullable subscriber|topic. Walls and dashboards are AIUI a combination topic timeline + topic TODO queues (e.g. merge proposals to review etc). This can be efficiently served by querying for one pageful of each such thing, asking for the timeline similarly and then combining in the appserver. Sending of notifications becomes just an API call and super performance isn't required. Querying who is subscribed to an object needs to be fast however. A fact table: topic event_tag exclude_event_tag subscriber can give subscription lists for topics very easily, still relying on in-LP expansion of teams. (A topic being e.g. 'bugs for project foo' - a structural subscription). I'm estimating a week to do fiddlying around like extracting our schema management code for reuse (needed for slony deploys etc), then a week to put a basic implementation of this schema with a simple private json API for it. (actually I think a bare bones thing is a couple of days... but double and double again :P) It's my hope this email can be a template for folk interested in bootstrapping better notifications. *: object ids are something we haven't pinned down yet, but one typical form is <type>:<row id> - e.g. Person:1234. _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp