Hi Greg,

Thank you for the speedy response.

  Are you an employee of or acting on behalf of DHS?
Nope. Just been contributing to OSM for the past couple years in my free time. I just like seeing high quality data getting added to the map.
Really only high-quality data should be imported, so I don't follow a
plan to import data of varying quality.  By high-quality I mean that
substantially all (>= 99%) of objects in the data set exist and the
positions are close (within 20m?) to the correct positions.

I totally agree. there are some datasets like"Land Mobile Commercial Transmission Towers" https://hifld-geoplatform.opendata.arcgis.com/datasets/geoplatform::land-mobile-commercial-transmission-towers-1/about

where there are a large quantity of objects that are not real or are extremely inaccurate to any actual data, but there are a couple items in the dataset that do actually correspond to an existing object and enough of these exist that I felt that it could be worth my time in certain situations to manually review extracts of the data to find the good quality data.

So data quality assessment
needs to ask "do >= 99% of the objects in the db currently exist".

In the web page, quality is labeled with subjective terms, and for an
effort of this scope I'd like to see quantitative definitions.
Sure. I'd be happy to create a definition like: "out of 100 objects taken randomly from this dataset, X were accurate and within 20m of the actual location of the object" and reassess all of the datasets.
In general, I am uncomfortable with advice for people to download data,
transform tags and upload.  I think it's far better to have a published
program (e.g. python script)

This way, people can run the conflation and examine the results to
assess quality.  And, I think actually writing this as code and
expecting it to be run repeatedly sharpens the thinking about the import
transformation process and shines a more careful light on quality.
The reason I suggested conflating manually was because I have next to 0 experience with programming, although I would be happy to try to create one, or seek help in having one created.
I think it's ok to take a dataset and do statistical quality control, where 
some fraction
of points are checked (against on-the-ground reality), and then if >99%
of them are correct, to assume they are all correct (enough that "fix
later" is ok).
That would be the idea. I'm not comfortable uploading anything with even slight inaccuracy without manually reviewing the objects first.
Note that some states, including MA, have email lists, and a number of
active mappers do not believe the use of Slack is legitimate (because
it's a proprietary system requiring signing a contract with a particular
company).  However others think it's okk.

And obviously talk-us, but it makes sense to get a more baked proposal
here.

I will be a little more clear with how I plan to reach out to local mappers:

I will message all the active mappers I can find in each state, regardless if they are active on lists, slack, etc. etc. to make sure that I can get as solid of a local support for this import as I can because of it's scale and scope. I'll also reach out to the official channels as well

Some of these datasets seem to be compliations of other datasets.
Nursing homes, that I picked because I can sort of armchair assess
quality, seems to be copied from state databases, at least in MA.  The
source data is by address, so it was geocoded somehow.  All of this is
unclear about licensing, so that makes me really want to understand the
"if published by the US, is PD" claim.
I think I may have been mistaken about the licensing situation when I made that claim. however, on the main page for the HIFLD datasets, there is a link for a data catalog spreadsheet with information about each dataset. All of the data that is publicly accessible on the website is open/in the public domain, and all copyrighted data is secured and requires special access, and I haven't included any of the copyrighted datasets in the list on the wiki as far as I am aware.
I am particularly skeptical of trail data,
The trail dataset I absolutely do not plan to import directly as the lines from the HIFLD because the accuracy the lines drawn is less than that which I expect form OpenStreetMap, but I feel that the metadata is extremely useful and OSM could benefit from it.
and this page doesn't clearly
separate import candidates from "recommend against import; useful as
reference layer".

I'll be sure to consider this as well when I redo the quality assessment for the datasets.


-James Crawford (SherbetS)


_______________________________________________
Imports mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/imports

Reply via email to