On 11/29/11 4:15 AM, Jonathan Gregory wrote:
That could be done if you can represent the data using a new kind of
featureType to be added to the CF chapter on discrete sampling geometries,
which will be included in CF 1.6 (coming soon). The text for the discrete
sampling geometry chapter is at
http://www.unidata.ucar.edu/staff/caron/public/CFch9-feb25_jg.pdf

Sorry that the discussion about this has been so disjointed, but I think our needs can not be met with somethign as simple a a new feature type.

We've had a bit of discussion about this, both on and off this list, but I don't think anyone has kept good notes of the main points raised. I'll try to write up a proposal soon, but briefly:

The goal is to store the output of "partical tracking"models. These are used to mode the advection and dispersion of various substances in a flow field: oil spills, larval transport, pollutants in the atmosphere, etc.

Some key features:

* In general, what is of interest is a collection (100s to 10,000s, or more... ) of particles, rather than an one individual particle. Thus, it is more likely that the user might ask:

"where are all the particles at time T?"

than:

"How did particle X travel over time?"

This has consequences on how one stores the data, so that either question can be asked but the first is the more efficient one.

* particles can have many associated attributes (properties, etc) that change over time.

* Some models create a set of particles at one time, the track them for the duration of the run -- that is the easy case. But many models create and destroy particles as the model runs -- adding particles when increased resolution is desired, removing them as they move out of the domain, or are destroyed by physical processes.

This is a key issue -- it is not so straightforward how to store them when they numbers change, and when you don't knoe at the start of teh model run how many particles there will be at any given time, or even the maximum number of particles.

With discussion, we had come to something of a consensus that in order to accommodate these needs, a "ragged array" approach would work. i.e. a 2-d table of sorts, with one row for each time step, and where each row might be any length. There appears to be something of a standard for this in CF already, and we have attempted to use that (more later).

We've got a version of this working now in out software, but...

The trick that Ute has brought up is that you may now neither how many particles there will be, nor how many time steps. Thus you would like to have two "unlimited" dimensions, which netcdf3 does not support. We've accomplished it because we know how many time steps will be run before we start.

My first thought is that we could use exactly the same format as hs been discusses already, but make it optional to use netcdf4, an an unlimited time dimension. Presumably these files could be easily converted, after the fact, to a netcdf3 format, as the number of time steps would then be known.

About netcdf 3 vs. 4 -- it seems netcdf4 has some nice features, after all, it was developed for a reason. However it doesn't not appear to have been widely adopted yet. However, maybe we really shouldn't bend over backwards to fit a data model to netcdf3 anymore -- it's a chick and egg problem, maybe time to make some eggs.

For our part, we use the netcdf4 lib with Python anyway, though our C/C++ code is all using netcdf3 -- the burden of compiling the hdf libs is something we choose to avoid, though it's not that big a deal.

Anyway -- more soon, I hope.

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[email protected]
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to