-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sorry for the delay in responding; crazy life, and I've been fixing existing bugs in my project rather than thinking about breaking new ground.
On 05/02/2010 12:35 AM, Scott Crosby wrote: > > With pruning out metadata, some judicious filtering of uninteresting tags, > and increasing the granularity to 10 microdegrees (about 1m resolution), > I've fit the whole planet in 3.7gb. > Sweet. I hope this format works for my use case. >> > I have no code for pulling entities out by ID, but that would be > straightforward to add, if there was a demand for it. > I would definitely need that. I'm coding to the travelingsalesman API's DataSet interface which does include retrieval by ID. > have to pay a disk seek whether it is in my format or not. My format being > very dense, might let RAM hold the working set and avoid the disk seek. 1ms > to decompress is already far faster than a hard drive, though not a SSD. Keeping everything in RAM is probably workable. At the very least, to go global with a format like this would seem to be a matter of starting with a mid-level VPS that stores everything on disk and eventually upgrading to a high-RAM, low disk space EC2 or GoGrid instance. Without it, I'm looking at half a TB of storage and possibly a significant chunk of RAM, and even so I don't think my current dataset can handle that. In other words, I like the option of keeping everything in RAM far better than what I'm doing right now. :) > > Could you tell me more about the kinds of lookups your application will do? > Sure. You can see the interface I've implemented here: http://travelingsales.svn.sourceforge.net/viewvc/travelingsales/trunk/libosm/src/org/openstreetmap/osm/data/IDataSet.java?view=markup Basically, the executive summary is that there are four broad kinds of lookups: Entity by ID, as mentioned earlier Entities based on intersection with bounding box, currently done by the somewhat inaccurate method of finding all contained nodes, then returning any associated ways/relations. Would be great if I could locate contained ways even if they don't have a node in the box, but even if not, it'd be no worse than what's there now. :) Entities by presence of certain tags, in some instances also with bounding box conditions (I.e. all "amenity"->"fuel" nodes, or all of such nodes within a given bounds) Nearest entity to a given point, expanding outward. I can, for instance, roughly find the nearest way by finding the node nearest to a set of coordinates, checking for its presence in any ways, then finding the next nearest and recursing outward until the conditions are met. The conditions check is done externally, so the search need only return the nearest entity, next nearest, etc.) I know you've said elsewhere that you don't want this format to replace the need for a database, and I respect that. I just don't quite know where that line is. Even so, I clearly don't need all of my database's functionality for the OSM-facing aspects of this app and hope that these limited uses are in scope. Thanks for thinking about and working on these issues. :) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvi6r4ACgkQIaMjFWMehWJvigCfV6d+2UY/5Mm1HCHquTMOG5Ru h50An0DeN8y+ADCBsVLw1V4w0xt+nql1 =wJIc -----END PGP SIGNATURE----- _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

