Re: Statefulness and algorithms for social networks/graphs

Markus Kohler Mon, 30 Nov 2009 14:55:08 -0800

Hi David,
Thanks a lot!
This makes a lot of sense to me.


Regards,
Markus

"The best way to predict the future is to invent it" -- Alan Kay


On Mon, Nov 30, 2009 at 11:24 PM, David Pollak <
[email protected]> wrote:

> On Mon, Nov 30, 2009 at 2:00 PM, Markus Kohler <[email protected]
> >wrote:
>
> >
> > > So, that means that each year, there will be 36,000M (36B) mailbox
> > entries.
> > >
> >
> >
> > I don't understand why we would need to store all entries in a cache,
> > instead of only keeping the last n entries for each user based on some
> > heuristics such as the last 3 days or something. I would somehow expect
> > that
> > the probability that a user wants to see a message is exponentially
> > decreasing with the messages age. For example that someone wants to see
>  a
> > message that is the 1000 newest message in his timeline is probably
> almost
> > zero.
> >
>
> Some people mine their timelines for information.  I agree that some aging
> policy is necessary as 36B entries will consume a lot of storage in RAM or
> on disk, but the last 1,000 is likely too few based on what I have seen of
> actual user behavior.
>
> In terms of an aging policy in an RDBMS, the cost of aging out old entries
> is likely to be an index scan or something on that order (DELETE FROM
> mailbox WHERE date < xxx or a user-by-user DELETE WHERE id IN (SELECT
> messages > 1000 in mailbox))
>
>
> >
> > > During peak load, we will need to prioritize which Users are processing
> > > messages/actions such that the system retains responsiveness and can
> > drain
> > > the load.  Put another way, knowing which Users have associated
> > long-lived
> > > sessions allows us to prioritize the message processing for those
> Users.
> > >  We
> > > allow more threads to drain the message queues for those Users while
> > > providing fewer threads for session-less Users.  Yeah, we could
> > prioritize
> > > on other heuristics, but long-lived session is dead simple and will
> cost
> > us
> > > 5K bytes per logged in user.  Not a huge cost and lots of benefit.
> > >
> > >
> > I have no issue with some session state and 5K is really low, and
> therefore
> > this is not an issue.  I don't get why it has to be in the session's
> state
> > because you could as well use the information that a user is online as a
> > guidance, even if the state would be stored somewhere out of the session.
> > Wouldn't make a difference I guess and storing it in the session looks
> > natural.
> >
>
> The state itself is not in the session.  The session is the guide that the
> user is online.  The session contains a listener that is attached to the
> User.  The only real state that resides in the session is the state
> necessary to batch up any messages that the User has forwarded to the
> listener in between the HTTP polling requests.  If there is an HTML front
> end, state about that front end will reside in the session as well, but
> that's a different issue.
>
>
> >
> >
> > > So, between the existing long-lived session long polling is more
> > efficient
> > > than shortlived session repeated polling and the upcoming need for
> > message
> > > prioritization indicate that long-lived sessions are the right design
> > > choice.
> > >
> > > Also, I hope that the above discussion makes it clear why I am
> insistent
> > on
> > > message-oriented APIs rather than document/REST oriented APIs.  ESME's
> > > design is not traditional and there are fewer tools helping us get the
> > > implementation right.  On the other hand, implementing ESME on top of a
> > > relational/REST model cannot be done.  Let's keep our design consistent
> > > from
> > > the APIs back.
> > >
> > >
> > I'm really not religious about REST, but I would somehow assume that in
> an
> > Enterprise context it could be an requirement to send a link to someone
> > else
> > pointing to a specific potentially old message in a certain Pool.
>
>
>
> Yes.  That's perfectly reasonable.  That message is like a static file on
> disk.  Once it's written, it remains unchanged until it's deleted.  This is
> an ideal application of a REST-style approach.  That's why I've advocated
> for a "message based" approach first, but a REST/static approach when the
> message based approach doesn't make sense.  What I am opposed to is a "try
> to make everything fit the REST model" approach to API design.
>
>
> > That
> > sounds to me like a requirement for some kind of REST API.
> > Would it be costly in your model to get the message nr. X  (+ n  older
> > messages) in a users timeline?.
> >
>
> A message will exist outside of a timeline.  There exists a cache of
> recently accessed messages.  Sometimes there will be a historic message
> that
> is referenced and that will be materialized from backing store and
> rendered.
>  It will likely fall out of cache if it's historical and not accessed
> again.
>
> Thanks,
>
> David
>
>
> >
> > Regards,
> > Markus
> >
> >
> >
> > > Thanks,
> > >
> > > David
> > >
> > > --
> > > Lift, the simply functional web framework http://liftweb.net
> > > Beginning Scala http://www.apress.com/book/view/1430219890
> > > Follow me: http://twitter.com/dpp
> > > Surf the harmonics
> > >
> >
>
>
>
> --
> Lift, the simply functional web framework http://liftweb.net
> Beginning Scala http://www.apress.com/book/view/1430219890
> Follow me: http://twitter.com/dpp
> Surf the harmonics
>

Re: Statefulness and algorithms for social networks/graphs

Reply via email to