Thanks - have added Mnesia to my list of things to check. And nginx does sound so much better than pound or haproxy - both of which I've tried to use (with little success when under load) in the past.
I would love to be able to use something like Google App Engine. I think this manually configured virtual machines stage we're all currently at is temporary - in the future our apps won't have a clue what they're running on, they'll use an API like Google App Engine. However, I don't think we can use it for Mapumental. We use GDAL (http://gdal.org/) as a C library for rendering the tiles, and our own C++ code for public transport route finding (see https://secure.mysociety.org/cvstrac/rlog?f=mysociety/iso/bin/fastplan-coopt.cpp) Neither can be run on Google App Engine. Francis On Mon, Aug 17, 2009 at 09:44:43AM +0100, Seb Bacon wrote: > Hi Francis, > > I was talking with someone at work about Mnesia, which sounds like > it's worth considering. It is distributed among N nodes, so it's good > for problems that require good cache locality, i.e. do a lot with the > data (because all data is on every node and replicates everywhere > quickly). For some types of data sets that breaks down quite soon of > course (you pretty much want to only have up to RAM-size large > dataset, e.g. up to 64 GB). Mnesia cares about replication of changes > all around, about failed notes, netsplits and syncing back from them > etc. > > I don't know much about MongoDB or CouchDB. Maybe you have to manage > syncing yourself on the application layer, but they probably scale > much further (depending on what you do in your application). But you > could also > have smaller clusters of Mnesia nodes and application code replicating > between them and multiplying presence of buckets across the clusters > that are requested often or something such. Another global Mnesia to > hold routing information (which bucket where). > > So a combination might also make sense, Mnesia for the routing > information on broker nodes and CouchDB or Memcached or MongoDB on the > storage nodes with the large blobs of tile and other precomputed data. > So your > application severs would pick a broker node at random, ask it where > some blob is and pass through the blob from the storage node to the > client. The brokers could also increment per-object access counters > and run some async jobs to have frequently accessed objects copied to > more storage nodes etc. > > Instead of NFS for distributing tiles, you could consider a web > service running off an httpd server like nginx. > > Another possibility for the entire infrastructure is Google App > Engine, which utilises BigTable for fast, distributed data indexing > and querying, and serves apps from a python or java runtime. There is > a queue API, a memcached API, a simple image manipulation API, and a > very good pricing model, which works out considerably cheaper than AWS > for all models I've considered; for example, CPU time is theoretically > billed at the same rate in AWS and GAE, but in GAE you just pay for > real CPU time, compared with AWS where you pay for instance uptime. > Of course, the price you pay for the cheapness and free scaling in GAE > is lack of control, and lack of customer service, and no choice of > where the data is stored (but I don't think mapumental has data > privacy concerns...?) . The flip side to the lack of control is that > the complexity is constrained. Personally I'm impressed by GAE and > will be continuing to use it on new projects where I can, but I've not > used it on a massively resource-intensive job yet. The only part of a > GAE app that isn't easily portable to a new architecture is the > datastore access, which can be abstracted away easily enough, so you > could always chose to migrate from GAE to AWS at a later date. > > Seb > > 2009/8/14 Francis Irving <[email protected]>: > > Mapumental is a website which shows contour maps of public transport > > travel times, house prices and other data. It's in closed beta. > > > > http://mapumental.channel4.com/ > > > > It uses lots of CPU running the transport route finding for each > > postcode, and rendering the tiles as they are served. > > > > Before we can openly release it, we need to make it scale easily > > (say, on Amazon Web Services). > > > > Currently it is using > > * A PostgreSQL database to store the points behind the static datasets > > such as scenicness and house prices. > > * Binary files on NFS to store the generated datasets of travel times. > > PostgreSQL was too slow, and used too much memory, to load in the > > large number of rows that would be required (300,000 for each user entered > > postcode). > > * A rendered tile cache, containing PNG files on the NFS filesystem. > > * PostgreSQL for queueing the jobs for the transport route finder. > > > > We now want to: > > * make the site scale easily (on Amazon Web Service), > > * make it easy to add more data sets. > > We had problems with NFS, so I need something to replace the binary > > files in NFS and the tile cache. It might also be prudent to use > > something easier to scale than a PostgreSQL database, although I > > suspect the load on it would be low so perhaps it isn't a problem. > > > > So the new version of Mapumental that I'm currently plannning has to > > store: > > a) cache of tiles rendered (some fairly generated rarely > > and frequently accessed e.g. house prices, some not accessed > > often compared to generation times, e.g. public transport route) > > b) coordinates and values of arbitary point datasets (e.g. > > school quality, asthma air quality, wind speed, route by > > car to a particular postcode etc. etc.) > > > > I'm looking for good, open source, alternatives to NFS and PostgreSQL > > to do this. Distributed data stores and queueing systems. > > > > What should I look at? What can I trust? > > > > I've already surveyed the field, and have my own ideas about what to > > do, but would be interested if anyone here has some experience or > > views on any of the obvious technologies. > > > > I'd like it to be stable and mature, and realistically it would > > already be in a Debian package. > > > > Francis > > > > _______________________________________________ > > Mailing list [email protected] > > Archive, settings, or unsubscribe: > > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public > > > > > > -- > skype: seb.bacon > mobile: 07790 939224 > > _______________________________________________ > Mailing list [email protected] > Archive, settings, or unsubscribe: > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public > _______________________________________________ Mailing list [email protected] Archive, settings, or unsubscribe: https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
