Re: [Imports] Proposal for proper OSM import solution (OpenMetaMap)

Jaak Laineste Thu, 18 Aug 2011 08:32:37 -0700

Hello,
 
> On Thu, Aug 18, 2011 at 6:23 AM, Jaak Laineste <[email protected]> wrote:
>> Hello,
>> 
>> Based on my own long-time thinking and small talk in WhereCamp Berlin
>> I created request for comments on kind of different approach to
>> imports called meta-mapping.
> 
> Since this proposal is nearly (exactly) identical to a thought I had
> about a year ago, I feel pretty qualified to speak about it.


Great!

> The objective of a tool like this would be to allow someone to run a
> database of geographic data and isolate it from other datasets- that
> is by keeping the databases separate, one may allow for more
> flexibility in changing the data in one of the non-OSM datasets.
> 
> An example would be if a city government's dataset were to add/remove
> listings of libraries, the conflation process in OSM would be harder
> than it would if there were simply a database where the information
> existed in isolation and then were linked to OSM. Simple, right?

Exactly. But I've never thought it is something simple.

> 1. By moving objects out of the OSM database, you move the complexity
> out of the OSM database and into the conflation database
> 
> Moving the problem doesn't solve it. It just hides it (and you'll see
> why in a the next few points).

Well, I'm not sure if it is OSM problem. In fact my initial implementation idea 
was to create special relations in OSM database which would be external links. 
Then others were suggesting that it would be easier and cleaner to do it 
completely outside, only vaguely dependent on specific geodatabase. OSM would 
be best reference database, as it sooner or later has every object on earth (I 
hope.

> 2. This approach implies that external data sets are correct.
> 
> Underlying this approach is an assumption that we can rely on other
> datasets accuracy. Sadly this is not the case. As I work with more
> datasets and compare them to on the ground surveying, I find that many
> government datasets are either wrong, or out of date.
> 
> Take TIGER as an example. I'm going through TIGER 2010 as we speak.
> Most of what i've found indicates that when OSM is active in an area,
> our maps are more accurate than TIGER, even TIGER 2010, which is more
> accurate than TIGER 2005 (what was imported in the US).
> 
> We need therefore to encourage more mappers to map and not to rely on
> these external datasets. This project would do the opposite.

I do not really agree with this implication, it does not assume that external 
dataset is correct. The process of linking (resolving conflations) would be 
actually same as with normal import: somebody has to review all data 
overlappings/conflicts/duplicates and solve them. It assumes that using 
external data is better than having nothing - just the same assumption you have 
with any external data usage and import. So after first linking round you would 
have correction of external data, at least as much as it is possible then.

But then in later days it could happen that the data what was ok in initial 
linking will be changed to something else (worse). Here you are right - 
external data provider can do harm to our data. I would say we assume that the 
external data provider works in the direction of making data better, not worse. 
In other words: it is ok to have bad data in the beginning, but it is not ok if 
the data modifications are in wrong direction.

Also there will be always problem of added new data - maintainer of database 
links has to do occasional reviews and correct this also. So with usual import 
you have to fix the data once. There is no bulk update possible so you do not 
need to worry about later updates. Now when we have later updates, maintainer 
has to start taking care about it also. More gain, more pain. 

Actually I'm afraid that most external datasources will be rather static (just 
OSM files). This way there is no risk that external dataset will be suddenly 
damaged. There would be no benefit of later updates, but even then there is 
advantage of MetaMap database - you keep the datasets clean and separated.

> 3. Data in the aggregated map won't be collected by on the ground mappers.
> 
> Some data, like the road data, will appear in both OSM and external
> datasets, but there's other data which may just never get collected by
> the community, if the map appears to already be complete.
> 
> And then since there's less on the ground mapping, the problems I
> mentioned earlier regarding flawed external datasets don't get noticed
> and corrected.

This is valid point. This is very general problem: data what "is already there 
(from imports, even just by the other mappers before you) is quite likely to be 
left behind, not reviewed and trusted. This is separate issue what I do not 
solve here. 

I assume here that often usage of external datasets is good and reasonable, and 
in many cases unavoidable (admin borders, shoreline and other samples). I 
propose here that MetaMapping is better way of using other datasets than 
importing.  There are several cases (possibly roads) where other datasources 
should be avoided.

In fact with OpenMetaMap you would always have two views and maps - one is pure 
OSM - all made by our mappers, and another would be complete map (OMM map) with 
all external sources. This is something what you with current imports approach 
cannot get. So if you wish you can ignore complete map and work on on OSM only.

And there is always risk that a mapper finds from Internet site called Google 
Maps and discovers that "the map" is already there and complete :)


> 4. It assumes OSM object IDs remain constant.
> 
> OSM object IDs change. They don't change a lot, but they do change,
> and you can't force users to jump through hoops to preserve them (as
> we've seen people propose).

Yes, it assumes that IDs do not change. This is most important. Can you explain 
more why and how OSM object IDs change? I've heard it too, but to analyze cases 
in more details I'd need to know the details.

> 5. It assumes external data sets IDs remain constant
> 
> One of the whole points of this project seems to be to keep up to date
> with external datasets, such as those put out by local governments
> every quarter.
> Since most of these external datasets will be given in Shapefile
> format, there will need to be a conversion process.
> 
> You can't be assured that the ID numbers on objects will remain
> constant from Q1 and Q2. Heck, I bet you'd find that even their own
> internal IDs won't remain constant, at least not for every single ID
> on every single object on every single external database, of which
> there may be dozens or more.
> 
> So you're constantly in a race to conflate changing object IDs.

 I would put to API specification that object ID must not change by definition. 
Of course we could not use some ad hoc ID-s there (row numbers etc), but some 
official ID-s. Here in Estonia we have statewide registry of topological 
objects, where every object has own key, and without good reason the IDs must 
not change. I assume that every admin (NUTS) area has unique official ID what 
is assigned to it and which must be used. In OMM it must go one level deeper, 
even node IDs must be persistent (perhaps - so detailed analysis is yet to be 
done) , so this is a challenge.

> 6. License nightmare
> 
> This is a powder-keg ready to explode, but I'll just say this:
> Incompatible licenses will not allow this.

Yes, by using OMM, OSM and DBX data then you would create derivate of all of 
them and they must be compatible. But here again - this is general issue what I 
do not solve nor create there. I'm comparing OMM solution with usual import, 
and license issues are there basically the same. Maybe the problem happens just 
later - with imports the importer has to check it over once, with OMM-linking 
the user has to be sure that he merges appropriate databases. 

Actually it would reduce nightmare a lot in some cases - if someone has 
imported data what was OK in 2010, but is not OK  in 2012 anymore. With imports 
you have to pick out the data (and all the derivations) from OSM somehow, which 
is nightmare. With OMM approach you just disable dataset (change its license 
field in the data directory, or just remove it) and it is done.

> 
> 7. Tremendous work.
> 
> The conflation process would be very hard to do, and frankly, not a
> lot of fun. You'll end up writing programs to do most of it I'm sure,
> but no programs will be perfect.
> 
> So people have to do it, and, frankly, it's not fun work.

 In principle I do not see significantly more work as you need to do with 
imports now. Extra work comes only from extra data updates - instead of data 
bursts you will have continuous stream to take care of - with all the gains and 
pains. You can use very similar tools (scripts and JOSM) as now. I hope that if 
external data providers can quite easily get back also community edits, then 
actually they should be much more motivated to look after their OSM/OMM 
derivate than now.

> These are the reasons I never went forward with this project.

I really hope you are open to reconsider :)


Jaak
_______________________________________________
Imports mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/imports

Re: [Imports] Proposal for proper OSM import solution (OpenMetaMap)

Reply via email to