Re: [CF-metadata] point observation data in CF 1.4

Christopher Barker Thu, 14 Oct 2010 16:22:06 -0700

Hi folks,

I just joined the list, so I apologize for breaking the threading, andbeing out of the loop, but Rich Signell alerted me that you werediscussing a format for particle tracking models, and we'd like to beinvolved.


A couple introductory comments:

We (NOAA Emergency Response Division) have a particle tracking model(GNOME) we use for oil, chemical, and random-other-stuff spills. For themost part, our output is in our own special formats, and doesn'tinteract well with other tools -- we'd like to change that, and we'regoing to netcdf for everything else, so we want to use it for this, too.

We use netcdf from C/C++ and Python, so would rather not do anythingsupported only in the Java libs. We're also on netcdf3 at this point,though we can upgrade to netcdf4 if there is a compelling reason.

Our model, at this point, keeps the number of particles constant (thoughsome may be flagged as "not released" or "off map", or "evaporated" orwhat have you. As a result, the "natural" way that we have stored theresults (in netcdf and other) is in (num_timesteps X num_particles) arrays.


We then have arrays for latitude, longitude, mass, various flags, etc.

As discussed here, other folks' models change the number of particles astime goes on, so such a simple block storage won't work. Our case is asubset of the more general case, to it should be easy to support thesimple case anyway (and we may well add variable particle numbers in thefuture as well)

When looking at the docs for PointObservationConventions, it didn't seemto fit quite right. One key point is that for the most part, we think ofthe collection of particles as an entity -- we are far more likely to beinterested in the whole collection at one point in time, than the pathof a particular particle over time. In fact, it's rare the we care atall about the ID of a particular particle -- we simply want to know itsproperties at a given time -- so it would be nice if the data storagecould do that efficiently.

We'd generally want time to be the unlimited dimension, as well, as wetend to run the models and analysis forward in time, and might well wantto incrementally output the data.

It seems ragged arrays are called for, though I've never tried to dothat in netcdf, so I don't know what the issues are. Are ragged arrays anetcdf4-only feature?

Of course, another option is to allocate the full amount of spacerequired to store the maximum number, and then mask off the invalidones. With compression, that may not be too bad a way to go.


A few specific comments:

I now write the data with redundant time as
a limited dimension, and records(time, latitude, longitude) and have
mass (record), radius(record) etc.


> Thanks anyway,
> Ute

Do you have an example output file for that you could share?

Clearly there is a need for another Point Convention type to handle
the output from particle tracking models like this.


I think so too -- it really is a different use case.

2. I think trajectory is when you follow a set of "things", boats, a
person. But at each time step they are identical, maybe not the same
number because of missing data. I could assume that I have a trajectory
but actually I can't be sure if my particles are the same as before.
Therefore I chose not to take that convention.

hmm -- it sounds like this is similar to what I was talking about above-- the collection of particles at a given time is what's important --not the path of any given particle.

As a not, we've been working some with the CDOG (deepwater blowoutmodel), and it doesn't keep track of which particles are which as theyare added an removed, either -- so it's a pretty common use case.

There may
be thousands or tens of thousands of particles, so it's not feasible
to write each trajectory into a separate file.

nor does that fit that natural data model -- one file per timestep wouldmake more sense.

We want a featureType that will allow us to write the entire
collection of particles at each time step into a single file, and that
will allow us to extract all the particles at a single time step, as
well as extract individual particle trajectories by their ID.


well said.

whereas as you describe it the time coord is common to all trajectories


yup.

 To arrange this, an indirection could be
used on
the time dimension:
  data(i,o)     x(i,o) y(i,o) z(i,o) t(tindex(i,o))
where i is the instance (which of the trajectories), o is the point
along that
trajectory, t is the coordinate vector of common times, and tindex is an
index
to t. For example, we might have these two trajectories (x,t) (omitting
y and
z for simplicity)
  (0,10) (1,11) (2,12)
         (3,11) (2,12) (1,13) (0,14)
Then t would be [10,11,12,13,14] (all the times). For the first trajectory
  x=[0,1,2] tindex=[0,1,2]
and for the second
  x=[3,2,1,0] tindex=[1,2,3,4]
Is that right? Perhaps/probably there's a neater or more natural way to
do it.


I'm having trouble following that -- but yes, it does not seem natural.

The synchronization of coordinates opens a potential door to great
simplicity of representation.

    metadata(i)    data(i,o)     x(i,o) y(i,o) z(i,o) t(o)

where i is the instance (which of the trajectories), o is simply the
time index.  The possible costs are proliferation in numbers of ways to
represent similar things and file size.   The question that I'd be
inclined to ask of Ute and Rich would be a judgment call on the cost in
file size that would result from filling missing values at the start/end
of each individual trajectory.

If you're going to do that, you could just store it as one bigrectangular array, with missing values marked (see above). Which surewould be easy but costly in storage space.

Another problem -- you may not know the maximum number of particles attime zero -- so you can't know how much space to allocate when you startwriting the file -- that may kill that approach.

 Optionally the metadata could include

    tstart_index(i)   tend_index(i)

This representation seems _the simplest from the standpoint of
application code (reading)_.  Synoptic view are simply projection at
fixed "o" index;  the history of an individual trajectory is simply a
projection at a fixed "i" index.


How do you do this with a ragged array? I'm missing something.

 Does the
saving of space through not padding the trajectories justify the complexity?
I don't know.

Do you have a choice if you don't know what your maximum number ofparticles is at the start?


Thanks all for working on this.

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[email protected]
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] point observation data in CF 1.4

Reply via email to