On Thu, Aug 18, 2011 at 11:31 AM, Jaak Laineste <[email protected]> wrote:
>> 2. This approach implies that external data sets are correct. >> >> Underlying this approach is an assumption that we can rely on other >> datasets accuracy. Sadly this is not the case. As I work with more >> datasets and compare them to on the ground surveying, I find that many >> government datasets are either wrong, or out of date. > I do not really agree with this implication, it does not assume that external > dataset is correct. The process of linking (resolving conflations) would be > actually same as with normal import: somebody has to review all data > overlappings/conflicts/duplicates and solve them. > It assumes that using external data is better than having nothing - just the > same assumption you have with any external data usage and import. For those of us who've been in the project for more than a year or two the jury is still out on this. It's easy to say "Well isn't some data better than no data?" but then we see places where imports have taken place also have low community uptake. These are of course correlations and we cannot automatically assume causation, but we can certainly raise these as concerns. > So after first linking round you would have correction of external data, at > least as much as it is possible then. If you have the corrections of the external data in OSM, you might as well have the data in OSM in the first place. > But then in later days it could happen that the data what was ok in initial > linking will be changed to something else (worse). Here you are right - > external data provider can do harm to our data. I would say we assume that > the external data provider works in the direction of making data better, not > worse. In other words: it is ok to have bad data in the beginning, but it is > not ok if the data modifications are in wrong direction. The problem isn't that external datasets get worse, the problem is that external datasets make it hard to see what's missing, or worse, what's wrong. The premise of OSM is that many people mapping improves data quality. We rely on the mappers to improve the map. We've shown through studies that where we have few mappers, our data is of lower quality, and where we have many mappers, our data is of very high quality. Therefore one of our main focuses for the project is to get more mappers. My concern is that by relying on external datasets, we reduce the mappers motivation, and therefore end up with fewer active mappers. > Also there will be always problem of added new data - maintainer of database > links has to do occasional reviews and correct this also. So with usual > import you have to fix the data once. There is no bulk update possible so you > do not need to worry about later updates. Now when we have later updates, > maintainer has to start taking care about it also. More gain, more pain. I'm sorry, while your English is far better than my Estonian, I do not understand this paragraph. Can you rephrase? > Actually I'm afraid that most external datasources will be rather static > (just OSM files). This way there is no risk that external dataset will be > suddenly damaged. There would be no benefit of later updates, but even then > there is advantage of MetaMap database - you keep the datasets clean and > separated. The key value proposition of external datasets is that they could be updated by external entities (think distributed version control). If you think this isn't the case, or is not the case you're designing around, then I see no benefit of using this technique vs improving our conflation tools inside OSM, which is something we need today! >> 3. Data in the aggregated map won't be collected by on the ground mappers. >> >> Some data, like the road data, will appear in both OSM and external >> datasets, but there's other data which may just never get collected by >> the community, if the map appears to already be complete. > This is valid point. This is very general problem: data what "is already > there (from imports, even just by the other mappers before you) is quite > likely to be left behind, not reviewed and trusted. This is separate issue > what I do not solve here. The point of the map is not just to exist, but to be better. If we do things which hurt the community, then they had better come with a huge benefit. > I assume here that often usage of external datasets is good and reasonable, > and in many cases unavoidable (admin borders, shoreline and other samples). This sentence has two statements: 1. You assume that imports are often unavoidable. 2. You assume that often the imports are good and reasonable. 1 isn't true. We see lots of imports of data that could be collected manually. TIGER could have been done manually given time. GNIS could have likely been done manually, and even Corine could have been done manually. OSM took shortcuts. That doesn't mean they were bad, but they weren't unavoidable. And if you look at the datasets users plop in most often, without discussing with the community, that could have been collected manually. Again, doesn't mean it's bad, but it's certainly avoidable. 2 isn't true at all. In fact, we have tons of problems due to imports. Imports are hard to get right (I'll address more technical issues later on in this mail). We have had to revert changesets, we've had to fix problems. I spent a lot of time fixing TIGER data, as do many US mappers. That's time we could be spending mapping, we spend fixing. > I propose here that MetaMapping is better way of using other datasets than > importing. There are several cases (possibly roads) where other datasources > should be avoided. I'll go into more technical depth on why the OSM model doesn't lend itself well later on in the mail. > In fact with OpenMetaMap you would always have two views and maps - one is > pure OSM - all made by our mappers, and another would be complete map (OMM > map) with all external sources. This is something what you with current > imports approach cannot get. So if you wish you can ignore complete map and > work on on OSM only. Sort of a devil's bargain eh? > And there is always risk that a mapper finds from Internet site called Google > Maps and discovers that "the map" is already there and complete :) Is it? If that were true, Google wouldn't have accidentally used OSM on at least one (but I think I remember two) occasions. There are places where OSM is of higher quality than Google. We just aren't as good consistently across the globe. >> 4. It assumes OSM object IDs remain constant. >> >> OSM object IDs change. They don't change a lot, but they do change, >> and you can't force users to jump through hoops to preserve them (as >> we've seen people propose). > > Yes, it assumes that IDs do not change. This is most important. Can you > explain more why and how OSM object IDs change? I've heard it too, but to > analyze cases in more details I'd need to know the details. They change because people delete things, and add things, and move things around. A simple example is that often I'll see a POI node, and I'll go ahead and draw the building outline and put the data on the building. I draw the building and delete the node. Another example would be that I might delete a road segment and redraw it, if it's easier to do that than to move every single node around. There have been proposals which ask/require users to map a certain way, but that's not the OSM way. There's nothing that says object IDs are unique or permanent. There have been proposals regarding that issue, and none of them have been feasible. And by the way, since we're on the topic of object IDs, your proposal only addresses one end product: rendering. How do you propose to handle routing? And what about layers? And what about about objects which contain other objects. Even if you ignore ways, you still have relations. >> 5. It assumes external data sets IDs remain constant >> >> One of the whole points of this project seems to be to keep up to date >> with external datasets, such as those put out by local governments >> every quarter. >> Since most of these external datasets will be given in Shapefile >> format, there will need to be a conversion process. >> >> You can't be assured that the ID numbers on objects will remain >> constant from Q1 and Q2. Heck, I bet you'd find that even their own >> internal IDs won't remain constant, at least not for every single ID >> on every single object on every single external database, of which >> there may be dozens or more. >> >> So you're constantly in a race to conflate changing object IDs. > > I would put to API specification that object ID must not change by > definition. And how do you propose to enforce that for every object in every dataset for every organization? Our import page mentions at least 30 datasets, but with the floodgates open, how many more would you have to deal with, and then enforce these rules on? >> 6. License nightmare >> >> This is a powder-keg ready to explode, but I'll just say this: >> Incompatible licenses will not allow this. > > Yes, by using OMM, OSM and DBX data then you would create derivate of all of > them and they must be compatible. But here again - this is general issue what > I do not solve nor create there. I'm comparing OMM solution with usual > import, and license issues are there basically the same. Maybe the problem > happens just later - with imports the importer has to check it over once, > with OMM-linking the user has to be sure that he merges appropriate databases. > > Actually it would reduce nightmare a lot in some cases - if someone has > imported data what was OK in 2010, but is not OK in 2012 anymore. That problem is solved with the CT. And we solve the general issue by /generally/ discouraging imports, especially those where a strict process hasn't been followed. I'm under the assumption that in your system, any user will be able to add a dataset. > In principle I do not see significantly more work as you need to do with > imports now. Extra work comes only from extra data updates - instead of data > bursts you will have continuous stream to take care of - with all the gains > and pains. You can use very similar tools (scripts and JOSM) as now. I hope > that if external data providers can quite easily get back also community > edits, then actually they should be much more motivated to look after their > OSM/OMM derivate than now. I think that you've touched on an important bit here. What you propose is not OpenStreetMap, and you couldn't call it OpenStreetMap. >> These are the reasons I never went forward with this project. > I really hope you are open to reconsider :) When the OSM wanted to split the project, I stayed here. Many of them will be encouraging of your work. Some of these people who are supporting you are folks who are banned from editing in OpenStreetMap. That, I think is why they're so encouraging of your idea, because it may be something they feel could give them the advantages of OSM without being OSM. I think that's sad. But no, despite its faults, I like OpenStreetMap and will stay with the project for the foreseeable future. - Serge _______________________________________________ Imports mailing list [email protected] http://lists.openstreetmap.org/listinfo/imports
