-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/01/2010 09:47 AM, Scott Crosby wrote: >> > I agree wholeheartedly with letting it evolve and I'm interested in hearing > yours (and others) thoughts on what additional features to include or > exclude. >
Some of these questions may be a bit premature, but I don't know how far along your design is, and perhaps asking them now may influence that design in ways that work for me. I'm developing an accessible map-browsing, GPS navigation app. You can read my initial blog post on the project here: http://thewordnerd.info/2010/03/introducing-hermes/ At the moment, this uses LibOSM from travelingsalesman and an as-of-yet-unreleased dataset using MongoDB for the geospatial queries. I don't really understand enough higher-level math to roll my own geospatial code, especially since I can't visually verify the results, so it's easier to use LibOSM and roll a dataset that I can run on a production site than it would be to re-invent the wheel. Unfortunately, this method introduces a variety of complications. First, the database for TX alone is 10 gigs. Ballpark estimations are that I might need half a TB or more to store the entire planet. I'll also need substantial RAM to store the working set for the DB index. All this means that, to launch this project on a global scale, I'd need a lot more funding than I as an individual am likely to find. I'm really excited to read your numbers for compression, because at first glance, this would seem to take the project from something that I'd need substantial EC2 infrastructure for, to something I can run on a mid-level VPS, slashing costs from $1000+month to $50 or so/month. So my questions: Is there a performance or size penalty to ordering the data geographically rather than by ID? I understand that this won't be the default case, but I'm wondering if there would likely be any major performance issues for using it in situations where you're likely to want bounding-box access rather than simply pulling out entities by ID. Also, is there any reason that this format wouldn't be suitable for a site with many active users performing geographic, read-only queries of the data? Again, I'd guess not, since the data isn't compressed as such, but maybe seeking several gigs into a file to locate nearby entities would be a factor, or it may work just fine for single-user access but not so well with multiple distinct seeks for different users in widely separate locations. Anyhow, I realize these questions may be naive at such an early stage, but the idea that I may be able to pull this off without infrastructure beyond my budget is an appealing one. Are there any reasons your binary format wouldn't be able to accomodate this situation, or couldn't be optimized to do so? Thanks. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvcR80ACgkQIaMjFWMehWLAngCcDTYdjW6SrKaPoKdqjjEY4r3U C34AnR4f8NEM18Z07Xr9vjli8/6UFYCz =feGc -----END PGP SIGNATURE----- _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

