On Sun, Mar 14, 2010 at 2:59 PM, Les Mikesell <[email protected]> wrote:
> Adam Lee wrote:
>
>> well, it depends on what you mean by scalability... i'm personally of
>> the opinion that traditional sessions should be avoided if you want to
>> truly scale.
>>
>
> And yet, everyone wants dynamic pages custom-generated to the user's
> preferences. So how do you reconcile that? You can help things a bit by
> splitting pages into iframe/image components that do/don't need sessions,
> and you can make the client do more of the work by sending back values in
> cookies instead of just the session key, but I'm not sure how far you can
> go.
>
Well, I guess it depends on your definition of "session." Obviously, you
need to account for user preferences and such, but I don't consider those
"session" data since they are consistent across any session that the user
instantiates.
Probably the easiest way to build a "stateless"/shared-nothing web
application, and what we've done to scale, is to store user authentication
data and the like in an encrypted cookie. Any other session-like data (geo
location from IP lookup, language preference, etc) can be set in separate
cookies. Since cookies are sent with every request, it is possible to
easily authenticate that the user is who they say they are and discern the
necessary data to build their page using only these cookies and you don't
need to look anything up in any sort of centralized session cache.
Data that is needed to authenticate a request or to display a message on a
subsequent page view (things that would be stored in the Flash in Rails,
from how I understand that to work) can be encoded into a cryptographically
secure "token" that is passed to the following request.
User preferences and settings, on the other hand, are not really session
data, as I said above. I've already described somewhat how we have this
data stored in a few previous posts on this thread, but I guess I'll do a
basic overview for the sake of completeness...
Our central datastore for users is still (unfortunately) a database (mysql),
but this is essentially only used for writes. All user data is also written
to TokyoTyrant, which is our primary persistent datastore for reads, and is
replicated exactly in memcached.
Since not all user data is needed for every page view, we've broken the user
data into what we call "user chunks," which roughly correspond to what would
be DB tables or separate objects in a traditional ORM. We built a service
that will get you the data you want for a specific user or set of users by
taking name(s) and a bitmask for what chunks you want. So, for example, if
I wanted to load the basic user data, active photo and profile data for the
user "admin," I'd just have to do something like this:
RoUserCache.get("admin", USER | ACTIVE_PHOTO | PROFILE);
The beauty of this is that the cache is smart-- it batches all of the
requests from a thread into bulk gets, it does as much as possible
asynchronously and it tries to get data from memcached first and, if it's
not there, then gets it from TokyoTyrant. TokyoTyrant and memcached are both
great at doing bulk gets, so this is pretty fast and, since they both speak
the same protocol (memcached), it wasn't terribly difficult to build. Doing
it asynchronously means that most of the latency is absorbed, too, since we
try to do these loads as early on in the page building process as possible,
so it tends to be there by the time the page tries to use it.
Anyway, I've strayed a bit from the topic at hand, but I guess I felt I
should elaborate on what I meant...
--
awl