Hi devs! Yesterday on IRC, again, was raised a question about how to get to know active user sessions on CouchDB. We all know that CouchDB known nothing about that, even how much active cookies there are around. So here is the idea how to fix that.
Problem ======= CouchDB has many ways to auth the users: basic auth, cookies, proxy auth, oauth, facebook (via plugin), etc. And only cookie auth can hold some sort of state, but we in any case cannot relay on the information that comes outside, only those which CouchDB could trust. So we need to hold session list internally. Proposal ======== To track user sessions we need some sort of table with the fields: user_id, last_activity_time. However, session must expire after some time, so somehow this table have to be cleaned from the records where last_activity_time + timeout < now. Using ets tables for that task is hard. However, we could use Erlang process instead. When authed user sends first request to CouchDB, supervisor checks if user_id exists in his private ets table. If it's not, new Erlang process get spawned and his PID recorded in the relation with the user_id in ets table. This process (session) holds a state for the three values: userCtx, timestamp (last_activity_time) and timeout. When authed user sends another request to CouchDB, that supervisor checks if user_id exists in the table. If it does, it gets the process PID and updates his timestamp. Meanwhile the process if locked in receive ... after timeout loop. If no updates comes from a supervisor, it dies by timeout which means session got expired. When child dies, supervisor removes a related record from his ets table. When user explicitly logs out, session process dies as well before it reach timeout. CouchDB provides some admin-only resource like /_active_sessions which collects userCtx and timestamp from the session processes which are still alive. Caveats ======= Such session list isn't persistent. If supervisor dies, all children (sessions) dies too. Well, it not a big problem if it's even is. We're not going to make it solid in anyway because we cannot promise that this information is precise in because of stateless auth methods. Session process could die while client with authed user still holds open connection (by listening changes feed in continuous mode). Here is need to think a little more about how to link continuous connection with the another Erlang process and don't let him die before connection get closed. If CouchDB serves billion users this solution might not scale well. However, if such case is yours, you might be on the wave already and that's not the biggest problem you care about (: Work for CouchDB 2.0 ==================== For cluster CouchDB this feature extends in a way that /_active_sessions resource additionally returns a node name where user session was registered. It could be multiple nodes. Some aggregation here might be needed to reduce duplicity. Possible Extension ================= We could also make a session more sensitive on the source from where user get authed (by IP address, HTTP headers, etc.), warn about ETOOMANY different authentication locations or limit such amount. That's could greatly improve a security. We could go forward and not just record a fact that user had been logged in, but also hold the last N actions he made: which documents they read, which they update etc. This information also get stored in session process. Such kind of feature greatly helps in audit of current server state. And so on... Epilog ====== That's how I see it. Thoughts? Critics? -- ,,,^..^,,,
