Re: [Imports] HIFLD

James Crawford via Imports Sun, 02 Oct 2022 13:52:47 -0700

Hi Greg,

Thank you for the speedy response.

  Are you an employee of or acting on behalf of DHS?

Nope. Just been contributing to OSM for the past couple years in my freetime. I just like seeing high quality data getting added to the map.

Really only high-quality data should be imported, so I don't follow a
plan to import data of varying quality.  By high-quality I mean that
substantially all (>= 99%) of objects in the data set exist and the
positions are close (within 20m?) to the correct positions.

I totally agree. there are some datasets like"Land Mobile CommercialTransmission Towers"https://hifld-geoplatform.opendata.arcgis.com/datasets/geoplatform::land-mobile-commercial-transmission-towers-1/about

where there are a large quantity of objects that are not real or areextremely inaccurate to any actual data, but there are a couple items inthe dataset that do actually correspond to an existing object and enoughof these exist that I felt that it could be worth my time in certainsituations to manually review extracts of the data to find the goodquality data.

So data quality assessment
needs to ask "do >= 99% of the objects in the db currently exist".

In the web page, quality is labeled with subjective terms, and for an
effort of this scope I'd like to see quantitative definitions.

Sure. I'd be happy to create a definition like: "out of 100 objectstaken randomly from this dataset, X were accurate and within 20m of theactual location of the object" and reassess all of the datasets.

In general, I am uncomfortable with advice for people to download data,
transform tags and upload.  I think it's far better to have a published
program (e.g. python script)

This way, people can run the conflation and examine the results to
assess quality.  And, I think actually writing this as code and
expecting it to be run repeatedly sharpens the thinking about the import
transformation process and shines a more careful light on quality.

The reason I suggested conflating manually was because I have next to 0experience with programming, although I would be happy to try to createone, or seek help in having one created.

I think it's ok to take a dataset and do statistical quality control, where 
some fraction
of points are checked (against on-the-ground reality), and then if >99%
of them are correct, to assume they are all correct (enough that "fix
later" is ok).

That would be the idea. I'm not comfortable uploading anything with evenslight inaccuracy without manually reviewing the objects first.

Note that some states, including MA, have email lists, and a number of
active mappers do not believe the use of Slack is legitimate (because
it's a proprietary system requiring signing a contract with a particular
company).  However others think it's okk.

And obviously talk-us, but it makes sense to get a more baked proposal
here.


I will be a little more clear with how I plan to reach out to local mappers:

I will message all the active mappers I can find in each state,regardless if they are active on lists, slack, etc. etc. to make surethat I can get as solid of a local support for this import as I canbecause of it's scale and scope. I'll also reach out to the officialchannels as well

Some of these datasets seem to be compliations of other datasets.
Nursing homes, that I picked because I can sort of armchair assess
quality, seems to be copied from state databases, at least in MA.  The
source data is by address, so it was geocoded somehow.  All of this is
unclear about licensing, so that makes me really want to understand the
"if published by the US, is PD" claim.

I think I may have been mistaken about the licensing situation when Imade that claim. however, on the main page for the HIFLD datasets, thereis a link for a data catalog spreadsheet with information about eachdataset. All of the data that is publicly accessible on the website isopen/in the public domain, and all copyrighted data is secured andrequires special access, and I haven't included any of the copyrighteddatasets in the list on the wiki as far as I am aware.

I am particularly skeptical of trail data,

The trail dataset I absolutely do not plan to import directly as thelines from the HIFLD because the accuracy the lines drawn is less thanthat which I expect form OpenStreetMap, but I feel that the metadata isextremely useful and OSM could benefit from it.

and this page doesn't clearly
separate import candidates from "recommend against import; useful as
reference layer".

I'll be sure to consider this as well when I redo the quality assessmentfor the datasets.



-James Crawford (SherbetS)


_______________________________________________
Imports mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/imports

Re: [Imports] HIFLD

Reply via email to