Hi Greg,
Thank you for the speedy response.
Are you an employee of or acting on behalf of DHS?
Nope. Just been contributing to OSM for the past couple years in my free
time. I just like seeing high quality data getting added to the map.
Really only high-quality data should be imported, so I don't follow a
plan to import data of varying quality. By high-quality I mean that
substantially all (>= 99%) of objects in the data set exist and the
positions are close (within 20m?) to the correct positions.
I totally agree. there are some datasets like"Land Mobile Commercial
Transmission Towers"
https://hifld-geoplatform.opendata.arcgis.com/datasets/geoplatform::land-mobile-commercial-transmission-towers-1/about
where there are a large quantity of objects that are not real or are
extremely inaccurate to any actual data, but there are a couple items in
the dataset that do actually correspond to an existing object and enough
of these exist that I felt that it could be worth my time in certain
situations to manually review extracts of the data to find the good
quality data.
So data quality assessment
needs to ask "do >= 99% of the objects in the db currently exist".
In the web page, quality is labeled with subjective terms, and for an
effort of this scope I'd like to see quantitative definitions.
Sure. I'd be happy to create a definition like: "out of 100 objects
taken randomly from this dataset, X were accurate and within 20m of the
actual location of the object" and reassess all of the datasets.
In general, I am uncomfortable with advice for people to download data,
transform tags and upload. I think it's far better to have a published
program (e.g. python script)
This way, people can run the conflation and examine the results to
assess quality. And, I think actually writing this as code and
expecting it to be run repeatedly sharpens the thinking about the import
transformation process and shines a more careful light on quality.
The reason I suggested conflating manually was because I have next to 0
experience with programming, although I would be happy to try to create
one, or seek help in having one created.
I think it's ok to take a dataset and do statistical quality control, where
some fraction
of points are checked (against on-the-ground reality), and then if >99%
of them are correct, to assume they are all correct (enough that "fix
later" is ok).
That would be the idea. I'm not comfortable uploading anything with even
slight inaccuracy without manually reviewing the objects first.
Note that some states, including MA, have email lists, and a number of
active mappers do not believe the use of Slack is legitimate (because
it's a proprietary system requiring signing a contract with a particular
company). However others think it's okk.
And obviously talk-us, but it makes sense to get a more baked proposal
here.
I will be a little more clear with how I plan to reach out to local mappers:
I will message all the active mappers I can find in each state,
regardless if they are active on lists, slack, etc. etc. to make sure
that I can get as solid of a local support for this import as I can
because of it's scale and scope. I'll also reach out to the official
channels as well
Some of these datasets seem to be compliations of other datasets.
Nursing homes, that I picked because I can sort of armchair assess
quality, seems to be copied from state databases, at least in MA. The
source data is by address, so it was geocoded somehow. All of this is
unclear about licensing, so that makes me really want to understand the
"if published by the US, is PD" claim.
I think I may have been mistaken about the licensing situation when I
made that claim. however, on the main page for the HIFLD datasets, there
is a link for a data catalog spreadsheet with information about each
dataset. All of the data that is publicly accessible on the website is
open/in the public domain, and all copyrighted data is secured and
requires special access, and I haven't included any of the copyrighted
datasets in the list on the wiki as far as I am aware.
I am particularly skeptical of trail data,
The trail dataset I absolutely do not plan to import directly as the
lines from the HIFLD because the accuracy the lines drawn is less than
that which I expect form OpenStreetMap, but I feel that the metadata is
extremely useful and OSM could benefit from it.
and this page doesn't clearly
separate import candidates from "recommend against import; useful as
reference layer".
I'll be sure to consider this as well when I redo the quality assessment
for the datasets.
-James Crawford (SherbetS)
_______________________________________________
Imports mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/imports