Does it necessarily have to be an all or nothing affair?  Do you have to start 
with duplicating the entire massive USGS data infrastructure and making it 
available for download?  Most likely there would only be a few people that 
would even know what to do with it in the first palce.  Would it be reasonable 
to start with a manageable subset of data and provide raw data access to it?  
Break it up into manageable chunks that could be downloaded with existing 
resources?  

Really it could be just extending what is already being done to include more 
data.  We run into this problem ourselves frequently.  For performance reasons 
you don't want to be sending more than 12 or 13mbs to the browser for 
rendering, so we put limits on the size of data that can be uploaded to the 
system.  We end up breaking data sets apart into logical geographic units that 
fit the size requirements.

For instance we did not upload the entire OSM database.  Instead we broke it up 
into manageable chunks based on geography - i.e. all the roads in Bolivia.  Not 
sure if that is applicable at USGS but may be more palatable than the 
monolithic approach that usually results in monolithic problems.  I think the 
community would be happy with any additional data.

best,
sean     

FortiusOne Inc,
2200 Wilson Blvd. suite 307
Arlington, VA 22201
cell - 202-321-3914

----- Original Message -----
From: "Eric Wolf" <[email protected]>
To: "Landon Blake" <[email protected]>
Cc: [email protected]
Sent: Friday, March 13, 2009 2:45:44 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Geowanking] Tim Berners-Lee on linked data

> I don't think any public agency in the right frame of mind would provide
> online access to their only copy of an important database.

And that's the problem: making a copy. It's kind of like "what kind of
container would you store the perfect solvent in?" The USGS pushes
technology to the fullest extent possible for a Federal bureaucracy in
terms of completeness of data. In a sense, we are trying to create the
1:1 scale map. Making a copy of the 1:1 scale map is about as
ridiculous.

> Make copies of the data available, not the source data. What people do
> with the data after it leaves your stewardship is up to them.

I don't think it would be too hard to get the Survey to let you come
in with, say, a big RAID with a couple petabytes free, and make a
copy. With the current executive orders, you'd probably find some
people bending over backwards to help you. But it would have to be a
pretty big disk array and it'd likely take several months to get it
all transfered. You'd probably want to start in Souix Falls, then
drive the array down to Denver, then stop in Rolla, Missouri on your
way to Reston. I can recommend hotels and good places to eat in each
location...

I sure wish more of the Survey were wired for Gig-E and fiber... Had
that once at a previous job. Seemless.usgs.gov is a nice data source
in ArcGIS _if_ you have about a 500Mbps connection to it.

> I would also note that organizations like Open Street Map are managing
> some level of "community quality control".

I greatly appreciate the way OSM is pushing the envelope on so many
fronts. When I met with Steve Coast a couple weeks ago, I was
literally gushing. But OSM's model is the exact inverse of that the
national mapping agency of a sovereign nation. By definition, control
is top-down. See my prior comments about executive orders.

And the USGS has played with community sourced information. The
National Map Corps predates OSM. The problem is, once we got the data,
figuring out how to turn it into something useful. OSM does a good job
by having the data collectors also digitize the information. But there
is a (necessary) void of ontological structure and very weak topology.

> Local government agencies already make all sorts of geospatial data
> available on the web. Has it been irreversibly corrupted?

Local governments make geospatial data available in the same forms we
currently do - web mapping interfaces, OGC APIs, shapefile downloads.
The one thing local governments can do that we cannot is provide
full-extent shapefiles. The shapefile format cannot handle a
full-extent representation of the US road network at 1:24,000. And
don't even think about hydrography... Stupid 32-bit integers...

The problem is this idea of getting "raw data" instead of these nicely
defined interfaces. The interfaces provide a modicum of security - but
at the cost of being able to freely range across the data.

-Eric

_______________________________________________
Geowanking mailing list
[email protected]
http://geowanking.org/mailman/listinfo/geowanking_geowanking.org

_______________________________________________
Geowanking mailing list
[email protected]
http://geowanking.org/mailman/listinfo/geowanking_geowanking.org

Reply via email to