Hi Peter, Thanks for the response.
> 1) The OSM API is a restful api that allows "live" editing: The editor > software a) opens a changeset, b) creates a node, c) adds a tag - same > for ways. > Between b and c there's an untagged osm element in the database (even if > it's in most cases a very short time). I think that is a rather orthogonal issue to validation, meaning that some validation should probably be launched when a changeset closes for example - true - but more important is the fact that even with the API calls that you described it is not possible to _end up_ with broken data. So for now I'm trying to discuss this at a more abstract level - that the contract would be "we can't have X in the database" but how it is implemented (at changeset close maybe?) - I cannot say (yet) as I am no expert in OSM. For now more important is whether this kind of thinking even makes sense for you. > 2) Ways without tags may be part of relations nevertheless: an outer way > of a multipolygon does not necessarily have tags, as the tags applied to > the multipolygon should go to the relation. Yes, that was just a quick example and saying "don't allow unconnected untagged nodes" may be too simplistic but still there is a lot of business logic that could be placed on the server that would help increase the quality of OSM data across the board. > 3) the free tagging scheme would allow similar stuff for nodes, too > (while I don't know any issue where that's used currently). A > theoretical example would be a set of nodes, which are defined points > inside a fuzzy area/region and others which are defined points outside > (where there's no concrete, hard boundary defined, e.g. for "the alpes". > I understand the benefits of "free tagging" approach. On the other hand it is kind of strange that even for "core" keys (e.g. "highway" or "surface") there is no validation/schema/whatever one calls it. In this case, what is more efficient: 1. Adding one more possible value for "highway" when it is needed and deploying such a change to production. 2. Constantly cleaning up the database when there are inconsistent entries (typos etc). In fact I think there is no such process as global cleanup - there are couple of bots that do so here and there but overall the data can be inconsistent. Paweł > Pushing this validation to the server side has several drawbacks: > - usually server load is the bottleneck in osm, not client load. I understand infrastructure constraints but I think (very-)long-term pushing stuff to the client-side will cause much more trouble than dealing with load issues but having consistent database and business logic (validation) in place. > - a check on server side would fix the corresponding tagging and makes > other tagging schemes invalid probably, a contradiction to the free > tagging scheme we have. > - the api would have to change to use transaction like semantics, wich > is again higher server load, but the only way to make sure not to create > these invalid stuff. > For now it is just a thought exercise and discussion but if I could propose some changes and perhaps implement some proof of concept, would it be taken seriously? You can say that "open source is about working not talking" and I should rather do something instead of discussing but as you can see these are pretty high level things that go against status quo - that's why I want to make sure my time is well spent... Paweł > regards > Peter > > Am 13.07.2012 19:27, schrieb Paweł Paprota: > > Hi all, > > > > Today I have encountered a lot of bad data in my area - duplicated > > nodes/ways. These probably stem from an inexperienced user or faulty > > editor software when drawing building. I corrected a lot of this stuff, > > see changesets: > > > > http://www.openstreetmap.org/browse/changeset/12208202 > > http://www.openstreetmap.org/browse/changeset/12208389 > > http://www.openstreetmap.org/browse/changeset/12208467 > > http://www.openstreetmap.org/browse/changeset/12208498 > > > > As you can see, these changesets remove thousands of nodes/ways. I have > > done this using JOSM validators and "Fix it" option which automatically > > merges/deletes nodes that are duplicated. > > > > That is all fine of course but this sparked a thought... why is this > > garbage data like this allowed into the database in the first place? Of > > course it can always be fixed client-side (JOSM, even some autobots) but > > why allow an unconnected untagged nodes or duplicated nodes, duplicated > > ways etc.? > > > > I understand (though don't wholly agree...) the concept of having a very > > generic data model where anyone can push anything into the database but > > it would be trivial to implement some server-side validations for these > > cases (so that API throws errors and does not accept such data) and thus > > reduce client-side work by a very significant margin - i.e. I could have > > been working on something more useful in that time than removing garbage > > data. > > > > Server-side validation could be of course taken even further - OSM > > server could reject meaningless tag combinations etc. - basically JOSM > > validators on the "error" level should be implemented as server-side > > validators, some "warning" level validators possibly as well. > > > > This would ensure data consistency and integrity at least a little > > bit... (of course first bad data would have to be pruned from existing > > database so that it is consistent with validation logic but that's for > > another discussion). > > > > What is the current consensus within OSM dev community on this aspect of > > OSM architecture? > > > > Paweł > > > > _______________________________________________ > > dev mailing list > > [email protected] > > http://lists.openstreetmap.org/listinfo/dev > > > > _______________________________________________ > dev mailing list > [email protected] > http://lists.openstreetmap.org/listinfo/dev _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

