On Sat, 22 Mar 2014, Paul Norman wrote: > I was doing some CLC cleanup tonight, removing landuse=meadow polygons > that didn't remotely match more recent imagery. Of all the meadow > polygons, not one was worth keeping. I found small woods, roads, farms, > residential areas, and basically anything but good data. After going at > it piece-meal I'm wondering if we need to go after it in a systematic > manner with a mechanical edit. > > There are 19k ways and 1.2k relations with CLC:id, landuse=meadow, and > version=1. About the same number of both have version>1. Based on the > sampling I did, if any are accurate, it is purely by chance. > > What I'm wondering is > > 1. I did the editing in Poitou-Charentes, France. Is the CLC data here > representative of other data? > > 2. Are there other CLC classifications which are just as bad?
I don't know about France, but CLC data in general seems to be just as bad as you describe. Only when there's really large, continuous body of something CLC might have guessed almost right but obviously the boundary accuracy is still similarly bad. On areas where there are lots of discontinuities/small features, the results pretty much equals to random for any small feature. > If the area I looked at is representative, I am contemplating proposing > a mechanical edit to remove the bad data. What are peoples thoughts on > this? > > I'm not getting into specific details at this point, as I'm just > evaluating the concept. Before actually doing a mechanical edit, I'd > provide technical details for review, and raise the question with a > wider audience. Usually when CLC comes up in discussions (with those drawing by hand or surveying), I don't ever hear anything positive about it, which is no wonder, those people are the ones who encounter all the garbage and are confused what should be done with it (remove or what). >From those, who imported it, I kept hearing that CLC can be fixed once in DB but obviously nobody ever stepped in to do that. Even those discussions have now died (at least here in Finland), although opening up of some other datasets might be a partial reason to that (but I personally doubt the fixers would have appeared regardless of the other datasets). IMHO, CLC is good example of the wrong import approach, i.e., somebody imports garbage data to DB first so that people can then fix it. The fixing, when already in DB, seems to hardly occur in practice for any significant number of the imported geometries. The correct approach is to fix things prior to import or immediately after putting something to DB. Delayed fixing is not going to work in practice. Although I can well understand why dumping first to DB looks appealing, it requires lot less effort first but in the end it will be big pain like CLC now is to all doing much more useful mapping (than CLC data ever is/was). Sadly, I expect there to be some resistance from those who are/were for this "dumping approach" if you start removing the garbage. At least here I always hear that "removal should not be done before the actual replacing with the other, better dataset occurs". -- i. _______________________________________________ Imports mailing list [email protected] https://lists.openstreetmap.org/listinfo/imports
