2009/8/17 Francis Irving <[email protected]>: > However, I don't think we can use it for Mapumental. We use > GDAL (http://gdal.org/) as a C library for rendering the tiles, > and our own C++ code for public transport route finding (see > https://secure.mysociety.org/cvstrac/rlog?f=mysociety/iso/bin/fastplan-coopt.cpp) > > Neither can be run on Google App Engine.
I suppose it wouldn't make sense to expose them as a web service in a different infrastructure..? Seb > On Mon, Aug 17, 2009 at 09:44:43AM +0100, Seb Bacon wrote: >> Hi Francis, >> >> I was talking with someone at work about Mnesia, which sounds like >> it's worth considering. It is distributed among N nodes, so it's good >> for problems that require good cache locality, i.e. do a lot with the >> data (because all data is on every node and replicates everywhere >> quickly). For some types of data sets that breaks down quite soon of >> course (you pretty much want to only have up to RAM-size large >> dataset, e.g. up to 64 GB). Mnesia cares about replication of changes >> all around, about failed notes, netsplits and syncing back from them >> etc. >> >> I don't know much about MongoDB or CouchDB. Maybe you have to manage >> syncing yourself on the application layer, but they probably scale >> much further (depending on what you do in your application). But you >> could also >> have smaller clusters of Mnesia nodes and application code replicating >> between them and multiplying presence of buckets across the clusters >> that are requested often or something such. Another global Mnesia to >> hold routing information (which bucket where). >> >> So a combination might also make sense, Mnesia for the routing >> information on broker nodes and CouchDB or Memcached or MongoDB on the >> storage nodes with the large blobs of tile and other precomputed data. >> So your >> application severs would pick a broker node at random, ask it where >> some blob is and pass through the blob from the storage node to the >> client. The brokers could also increment per-object access counters >> and run some async jobs to have frequently accessed objects copied to >> more storage nodes etc. >> >> Instead of NFS for distributing tiles, you could consider a web >> service running off an httpd server like nginx. >> >> Another possibility for the entire infrastructure is Google App >> Engine, which utilises BigTable for fast, distributed data indexing >> and querying, and serves apps from a python or java runtime. There is >> a queue API, a memcached API, a simple image manipulation API, and a >> very good pricing model, which works out considerably cheaper than AWS >> for all models I've considered; for example, CPU time is theoretically >> billed at the same rate in AWS and GAE, but in GAE you just pay for >> real CPU time, compared with AWS where you pay for instance uptime. >> Of course, the price you pay for the cheapness and free scaling in GAE >> is lack of control, and lack of customer service, and no choice of >> where the data is stored (but I don't think mapumental has data >> privacy concerns...?) . The flip side to the lack of control is that >> the complexity is constrained. Personally I'm impressed by GAE and >> will be continuing to use it on new projects where I can, but I've not >> used it on a massively resource-intensive job yet. The only part of a >> GAE app that isn't easily portable to a new architecture is the >> datastore access, which can be abstracted away easily enough, so you >> could always chose to migrate from GAE to AWS at a later date. >> >> Seb >> >> 2009/8/14 Francis Irving <[email protected]>: >> > Mapumental is a website which shows contour maps of public transport >> > travel times, house prices and other data. It's in closed beta. >> > >> > http://mapumental.channel4.com/ >> > >> > It uses lots of CPU running the transport route finding for each >> > postcode, and rendering the tiles as they are served. >> > >> > Before we can openly release it, we need to make it scale easily >> > (say, on Amazon Web Services). >> > >> > Currently it is using >> > * A PostgreSQL database to store the points behind the static datasets >> > such as scenicness and house prices. >> > * Binary files on NFS to store the generated datasets of travel times. >> > PostgreSQL was too slow, and used too much memory, to load in the >> > large number of rows that would be required (300,000 for each user entered >> > postcode). >> > * A rendered tile cache, containing PNG files on the NFS filesystem. >> > * PostgreSQL for queueing the jobs for the transport route finder. >> > >> > We now want to: >> > * make the site scale easily (on Amazon Web Service), >> > * make it easy to add more data sets. >> > We had problems with NFS, so I need something to replace the binary >> > files in NFS and the tile cache. It might also be prudent to use >> > something easier to scale than a PostgreSQL database, although I >> > suspect the load on it would be low so perhaps it isn't a problem. >> > >> > So the new version of Mapumental that I'm currently plannning has to >> > store: >> > a) cache of tiles rendered (some fairly generated rarely >> > and frequently accessed e.g. house prices, some not accessed >> > often compared to generation times, e.g. public transport route) >> > b) coordinates and values of arbitary point datasets (e.g. >> > school quality, asthma air quality, wind speed, route by >> > car to a particular postcode etc. etc.) >> > >> > I'm looking for good, open source, alternatives to NFS and PostgreSQL >> > to do this. Distributed data stores and queueing systems. >> > >> > What should I look at? What can I trust? >> > >> > I've already surveyed the field, and have my own ideas about what to >> > do, but would be interested if anyone here has some experience or >> > views on any of the obvious technologies. >> > >> > I'd like it to be stable and mature, and realistically it would >> > already be in a Debian package. >> > >> > Francis >> > >> > _______________________________________________ >> > Mailing list [email protected] >> > Archive, settings, or unsubscribe: >> > https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >> > >> >> >> >> -- >> skype: seb.bacon >> mobile: 07790 939224 >> >> _______________________________________________ >> Mailing list [email protected] >> Archive, settings, or unsubscribe: >> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >> > -- skype: seb.bacon mobile: 07790 939224 _______________________________________________ Mailing list [email protected] Archive, settings, or unsubscribe: https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
