Agree, think the issue in this case definitely wasn't related to multiple machines. In general, though, you often can do much better performance-wise on large data sets by running queries on data subsets across multiple systems, whatever software you use. Most NoSQL dbs try to make this particularly easy.
On Wed, 2011-04-13 at 14:44 -0500, Ian Dees wrote: > On Wed, Apr 13, 2011 at 2:35 PM, Andreas Scheucher > <[email protected]> wrote: > hi, > > > some weeks ago, i got interested in NoSQL datababase products. > I had no experience with them up to now, but as it was a > requirement for an job, I started to read about apache > cassandra and thougth, this would be interesting for > openstreetmaps. > > > > > Yep, Cassandra would be an interesting option to try. In fact many > moons ago I spoke with the folks at SimpleGeo about attempting to host > some OSM data there in their infrastructure. At the time they didn't > support anything but point features (and had no other way of dealing > with metadata) so I haven't pursued it. > > > Additionally, this talk they gave was quite informative and gave quite > a bit of information about how they store their location data in > Cassandra: http://www.youtube.com/watch?v=7J61pPG9j90 > > > up to now my findings are only theoreticaly, but I would like > to digg deeper, when I find time. > > > But one think I wonder about is, you tested it on one machine. > Isn't it like that, you need several nodes and loads of data > to really benefit from NoSQL databases? At least this was my > understanding of the whole thing... > > > The purpose of multiple machines in this case is to have relatively > reliable storage and multiple copies of the data on different > machines, not necessarily an increase in read speed (Greg, maybe you > could correct me?). Last time I looked at MongoDB seriously for OSM I > imported an entire planet, so it was "loads of data" :). I have not > tried a whole planet with the more recent versions, though. > > > > greets, > Andreas > > > 2011/4/13 Ian Dees <[email protected]> > > > On Tue, Apr 12, 2011 at 3:56 PM, Steve Coast > <[email protected]> wrote: > Interesting. > > How efficient is the (big)int indexing and/or > masking? > > > > I haven't had a chance to look at the integer > indexing/masking. If I remember it from discussions on > dev a long while ago I think it's very close to > geohashes. > > > Was this all on a single machine? > > > Yes. > > > > > > > > On 4/12/2011 1:52 PM, Ian Dees wrote: > > Yep. > > > > On Tue, Apr 12, 2011 at 3:51 PM, Steve Coast > > <[email protected]> wrote: > > and using the builtin spatial > > index? > > > > > > > > On 4/12/2011 1:50 PM, Ian Dees > > wrote: > > > Yes, one document per > > > node/way/relation. > > > > > > On Tue, Apr 12, 2011 at 3:47 PM, > > > Steve Coast <[email protected]> > > > wrote: > > > how was the data put in > > > the db though? 1 document > > > per node? > > > > > > > > > On 4/12/2011 1:39 PM, > > > Nolan Darilek wrote: > > > > Oopse, meant for this to > > > > go to the whole list. > > > > > > > > > > > > > > > > -------- Original > > > > Message -------- > > > > Subject: > > > > Re: > > > > [OSM-dev] > > > > OSM and > > > > MongoDB > > > > Date: > > > > Tue, 12 Apr > > > > 2011 > > > > 15:26:41 > > > > -0500 > > > > From: > > > > Nolan > > > > Darilek > > > > <[email protected]> > > > > To: > > > > Ian Dees > > > > <[email protected]> > > > > > > > > > > > > I had/am having a > > > > somewhat bad experience > > > > storing OSM data in > > > > MongoDB. > > > > > > > > Initially I stored all > > > > map data in MongoDB, but > > > > queries took ages. The > > > > same queries that happen > > > > in 100-200 MS now often > > > > took nearly a second. > > > > Additionally, some took > > > > upwards of 5, and I even > > > > found spots on my map > > > > sparsely populated with > > > > points, but which > > > > reliably performed the > > > > queries I need in 30+ > > > > seconds. > > > > > > > > I filed a thorough bug > > > > in their tracker, > > > > including a dataset and > > > > queries that reliably > > > > duplicated the issue. It > > > > was marked wontfix, I > > > > abandoned MongoDB, and > > > > it was apparently > > > > re-opened and fixed > > > > several months later. So > > > > perhaps it's a non-issue > > > > now. > > > > > > > > I'm still using MongoDB > > > > for part of my current > > > > project, user POI > > > > storage. It does indeed > > > > use geohashes, and I'm > > > > experiencing strange > > > > accuracy issues. My > > > > platform is pedestrian > > > > navigation with many > > > > small distance queries. > > > > Points in the > > > > non-MongoDB dataset are > > > > reliably detected in a > > > > radius roughly 100 > > > > meters around the > > > > traveler. Points in > > > > MongoDB queried with the > > > > same bounding boxes > > > > don't appear until > > > > they're within 30-40 > > > > meters. I recently > > > > updated from an older > > > > version to a new build > > > > of 1.8. The older > > > > version widely varied > > > > the detection range. > > > > Some points were > > > > detected 100 or so > > > > meters out, while others > > > > weren't picked up until > > > > 30 or so. It was always > > > > the same points, too. > > > > The point for my > > > > apartment remains > > > > reliably visible for > > > > ~100 meters or so, while > > > > the corner store and > > > > restaurant didn't appear > > > > until I was very close. > > > > 1.8 at least appears to > > > > be consistent, always > > > > detecting at 30 meters > > > > or so. I can only assume > > > > that this is a geohash > > > > oddity that only appears > > > > for very small > > > > differences, something > > > > that works out to > > > > rounding error for > > > > larger values. > > > > > > > > I like MongoDB for many > > > > things, but not for > > > > geospatial data more > > > > complicated than a > > > > series of points. I'm > > > > working on migrating > > > > user/POI storage to a > > > > geospatial store. > > > > > > > > > > > > On 04/12/2011 01:20 PM, > > > > Ian Dees wrote: > > > > > Yep, and I think Mongo > > > > > uses geohashes as > > > > > their index behind the > > > > > scenes. One of the > > > > > problems with that, > > > > > though, is they have > > > > > some arbitrary length > > > > > that they compute the > > > > > geohash to and when > > > > > you have lots of > > > > > points (as OSM data > > > > > does) the buckets > > > > > they're searching are > > > > > very full. > > > > > > > > > > On Tue, Apr 12, 2011 > > > > > at 1:00 PM, Steve > > > > > Coast > > > > > <[email protected]> > > > > > wrote: > > > > > bbox queries > > > > > using the > > > > > built in > > > > > spatial > > > > > indexing > > > > > presumably? > > > > > OSM has it's > > > > > own magical > > > > > bitmask for > > > > > that, that may > > > > > also be as > > > > > fast in mongo, > > > > > who knows. > > > > > > > > > > > > > > > On 4/11/2011 > > > > > 5:58 PM, Ian > > > > > Dees wrote: > > > > > > On Mon, Apr > > > > > > 11, 2011 at > > > > > > 6:36 PM, > > > > > > Sergey > > > > > > Galuzo > > > > > > > <[email protected]> wrote: > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > I am > > > > > > working on > evaluation of MongoDB for several storage solutions at hand. Some of them > resemble current OSM editing database. I have heard that OSM dev is/was > evaluating MongoDB also. I was wondering whether it possible to share the > findings? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In my > > > > > > experimentation > with MongoDB (seen here: https://github.com/iandees/mongosm/) I found it to > be very slow. Inserts were speedy, but bounding-box queries took a long time. > > > > > > > > > > > > > > > > > > The most > > > > > > recent dev > > > > > > version of > > > > > > MongoDB > > > > > > includes > > > > > > "multi-location > documents" support: > > > > > > > http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-MultilocationDocuments > > > > > > > > > > > > > > > > > > > This would > > > > > > allow a > > > > > > single way > > > > > > document to > > > > > > be indexed > > > > > > at multiple > > > > > > locations > > > > > > and vastly > > > > > > speed up the > > > > > > map query. > > > > > > > > > > > > > _______________________________________________ > > > > > > dev mailing list > > > > > > > [email protected] http://lists.openstreetmap.org/listinfo/dev > > > > > > > > > > > _______________________________________________ > > > > > dev mailing > > > > > list > > > > > [email protected] > > > > > > http://lists.openstreetmap.org/listinfo/dev > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > dev mailing list > > > > > [email protected] > > > > > > http://lists.openstreetmap.org/listinfo/dev > > > > > > > > > > > > > _______________________________________________ > > > > dev mailing list > > > > [email protected] > > > > > http://lists.openstreetmap.org/listinfo/dev > > > > > > > _______________________________________________ > > > dev mailing list > > > [email protected] > > > > http://lists.openstreetmap.org/listinfo/dev > > > > > > > > > > > _______________________________________________ > dev mailing list > [email protected] > http://lists.openstreetmap.org/listinfo/dev > > > > > _______________________________________________ > dev mailing list > [email protected] > http://lists.openstreetmap.org/listinfo/dev _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

