On Thu, Nov 18, 2010 at 10:28 AM, Robert Collins <robe...@robertcollins.net> wrote:
> The notation I'm going to use is this: > 'foo' : the literal value foo. > foo: a variable representing foo > ...: Repeated things. > + prefixing a column name : 'has a secondary index' > (Thing) : this row is sorted on Thing. For instance > 'Address':value(timestamp) - sorted on the timestamp. > > ColumnFamily(aka Table): CF|SCF (ColumnFamily or SuperColumnFamily) > > Row-Key : [+]ColumnName:(value) > > Remember too that every concrete column - essentially a stored cell - > has a timestamp on it. Have you need any way of diagramming systems? I'm finding this and the Riptano slides pretty unreadable, even for these toy examples. > All in all I'm very glad Gary and I were here for the face time with > Matthew from Riptano - we're in a very good position now in terms of > understanding what it would take, and whether we'd want to, use > Cassandra in some capacity going forward. > > For the million dollar question though - I think we probably want to > use Cassandra for some stub systems (e.g. oauth, sessions, oopses, > memcache replacement), as it has a much better scaling and schema > evolution story than postgresql - but the lack of transactions and > fundamentally different design approach needed mean that while > Cassandras performance and scaling are very attractive, we'd be nuts > to try and use hook it into Launchpad until our layering is sorted out > - we'd need a dedicated layer where we could abstract out the overall > operation vs the transaction/update logic. oauth - The issue in PostgreSQL is the nonce handling. This would be in memcache now except that we are relying on atomic commits to avoid race conditions in the replay avoidance stuff. Cassandra will hit the same issue. For nonce handling, I think memcache is a better fit - volatile is fine so keep it fast and avoid all that disk activity, and if a nonce is consumed other clients need to know that immediately (rather than waiting for information to replicate around). sessions - seems a decent fit. I'm not sure if the existing setup is a problem that needs solving though. oopses - Probably a better fit than PostgreSQL. Can start with the reporting side of things if that is a problem. If we can generate the reports we need, then we can get systems submitting directly to the DB or via Rabbit. memcache - Using memcache is essentially free because of its limitations. I don't think Cassandra is a suitable replacement for our current volatile-data-only usage of memcache. There have been some things we decided memcache was not suitable that Cassandra could be a better fit for. Is it suitable for replacing the bulk of the Librarian? Disaster recovery will be an issue. We need things in place before we put any data we care about into it. Staging and qa systems will be interesting. I'm not sure how things could be integrated. I guess we would need to build a staging cassandra database from a snapshot taken after the PostgreSQL dump was taken, with missing data being ok because of 'eventually consistent'. I don't see a win in replacing small systems that are not in trouble. We may just as easily avoid the trouble by redesigning for PG or memcache than by redesigning for Cassandra. Adding another moving part like Cassandra introduces a lot of moving parts - too much overhead for the toy systems. If we want to use it, I'd want to see it used for a big system that could do with a performance boost. Publishing history in soyuz, Branch/BranchRevision/Revision in codehosting, *Message/Message/MessageChunk, LibraryFileAlias/LibraryFileContent, full text search, karma. -- Stuart Bishop <stu...@stuartbishop.net> http://www.stuartbishop.net/ _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp