You might prefer to try Nujan instead of mixing python and netcdf,
although variables are limited to 2GB

http://www.ral.ucar.edu/~steves/nujan.html


On Fri, Nov 25, 2011 at 11:01 AM, Ute Brönner <[email protected]> wrote:
> Hi folks,
>
> I kind of lost track of our latest discussions and had the feeling that this 
> was partly outside the mailing group; so I will try to sum up what we were 
> discussing.
> My latest try was to produce NetCDF for particle trajectory trying to write 
> out the concentration grid which resulted in a 11GB netFCDF3 file :-(
>
> So we have different motivations for discussion particle trajectory and 
> netcdf4.
>
> First question:
> Does anybody know if and if yes, when writing netCDF4 will be incorporated 
> into the NetCDF Java library? Or will we use Python with the help of Jython 
> etc. (http://www.slideshare.net/onyame/mixing-python-and-java) to write 
> netCDF4?
>
> Second question:
> Is there a de facto standard / proposal for writing Particle Trajectory Data 
> which could be CF:featureType: <whatever we agree on>? The suggestion below 
> is not suitable because:
> 1) we don't track a particle the whole time, it may disappear and show up 
> again later, but if I have 1000 particles in time step 1 and 1000 in time 
> step 2 we cannot be sure these 1000 are the same as before.
> 2) I cannot know the number of time steps in advance.
>
> I would like sth. like
> dimensions:
>   particle = UNLIMITED; //because it may change each time step
>   time = UNLIMITED; // because I don't know
>
> then every variable is like
> latitude (particle, time)
> longitude (particle, time)
>
> and I might have
> int number_particles_per_timestep(time);
>     :units = "1";
>     :long_name = "number particles per current timestep";
>     :CF:ragged_row_count = "particle";
>
> That some of you need to know which spill a particle came from, may be solved 
> with a 3rd dimension spill
> dimensions:
>   spill = 3; // or how many one has
>   particle = UNLIMITED; //because it may change each time step
>   time = UNLIMITED; // because I don't know
>
> particle (spill, time)
>
> then every variable is like
> latitude (particle)
> longitude (particle)
>
> how would one write this? With coordinates or as hierarchical data structure?
> At least we need the ability to use several unlimited dimensions and the 
> ragged-array feature.
>
> Third question:
> How can we compress big netCDF3 files? Or is it smarter to go for netCDF4 
> directly with hierarchical data. As in my example above I would need to write 
> out a 11 GB file and then deflate it like described here 
> http://www.unidata.ucar.edu/mailing_lists/archives/netcdf-java/2010/msg00095.html
>   or with Rich's script; but is that really necessary?
>
>
> Hoping to get up the discussion again and that we agree on a standard quite 
> soon!
> Have a nice weekend!
>
> Best,
> Ute
>
> -------- Original Message --------
> Subject: [CF-metadata] Particle Track Feature Type (was: Re: point 
> observation data in CF 1.4)
> Date: Fri, 19 Nov 2010 04:15:35 +0100
> From: John Caron <[email protected]>
> To: [email protected] <[email protected]>
>
> Im thinking that we need a new feature type for this. Im calling it 
> "particleTrack" but theres probably a better name.
>
> My reasoning is that the nested table representation of trajectories is:
>
> Table {
>   traj_id;
>   Table {
>      time;
>      lat, lon, z;
>      data;
>   }
> }
>
> but this case has the inner and outer table inverted:
>
> Table {
>   time;
>   Table {
>      particle_id;
>      lat, lon, z;
>      data;
>      data2;
>   }
> }
>
> So, following that line of thought, the possibilities in CDL are:
>
> 1) If avg number of particles ~ max number of particles at any time step, 
> then one could use multdimensional arrays:
>
> dimensions:
>   maxParticles = 1000 ;
>   time = 7777 ; // may be UNLIMITED
>
> variables:
>
>   double time(time) ;
>
>   int particle_id(time, maxParticles) ;
>   float lon(time, maxParticles) ;
>   float lat(time, maxParticles) ;
>   float z(time, maxParticles) ;
>   float data(time, maxParticles) ;
>
> attributes:
>   :featureType = "particleTrack";
>
> note maxParticles is the max number of particles at any one time step, not 
> total particle tracks. The particle trajectories have to be found by 
> examining the values of particle_id(time, maxParticles).
>
> 2) The CDL of the ragged case would look like:
>
> dimensions:
>   obs = 500000; // UNLIMITED
>   time = 7777 ;
>
> variables:
>   int time(time) ;
>   int rowSize(time) ;
>
>   int particle_id(obs) ;
>   float lon(obs) ;
>   float lat(obs) ;
>   float z(obs) ;
>   float data(obs) ;
>
> attributes:
>   :featureType = "particleTrack";
>
> in this case, you dont have to know the max number of particles at any one 
> time step, but you do need to know the number of time steps beforehand. The 
> particle trajectories have to be found by examining the values of 
> particle_id(obs). The particles at time step i are contained in the obs 
> variables between start(i) to start(i) + rowSize(i).
>
> these layouts are optimized for processing all particles at a given time, and 
> for sequentially processing time steps. If one wanted to process particle 
> trajectories, that will be much slower. If you needed to do it a lot, you 
> might want to rewrite the file. a more sophisticated application, possibly a 
> server, could write an index to speed it up.
>
>
> _______________________________________________
> CF-metadata mailing list
> [email protected]
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Rich Signell
> Sent: Donnerstag, 18. August 2011 19:04
> To: Christopher Barker
> Cc: Ute Brönner; Ben Hetland; Mark Reed; Nils Rune Bodsberg; CJ 
> Beegle-Krause; Caitlin O'Connor; Alex Hadjilambris; Rob Hetland
> Subject: Re: netcdf for particle trajectories
>
> Chris,
>
>
>>>> so I'll make part of my homework to deliver you a Python script
>>>> using Whitaker's NetCDF4 that writes a sample file.
>>
>> How did this go, Rich?
>
> Yes, I took Rob Hetland's Python short course, and yes, I wrote a small 
> example showing how to take NetCDF3 particle tracking output and create a 
> compressed NetCDF4 file with chunking.  I just forgot to send it.  ;-)
>
> Note: You can get a OpenDAP-enabled NetCDF4 Python module for both 32 and 64 
> bit windows from:
> http://www.lfd.uci.edu/~gohlke/pythonlibs/
>
> -Rich
>>
>> We're getting closer to a prototype file (i.e. we've got GNOME writing
>> something, but it still needs some tweaking). I'll sent out an example
>> when I think we're close.
>>
>> One new issue:
>>
>> In GNOME, we have the concept of any number of "spills" -- each spill
>> is a set of particles that usually share some properties.
>>
>> So we're trying to figure out how to capture that. Two ideas:
>>
>> 1) each spill is a unique set of data -- but I think that it would ony
>> be possible to do this by using a convension on teh variable names:
>>
>> data_1
>> particle_count_1
>> longitude_1
>> latitude_1
>> ...
>>
>> data_2
>> particle_count_2
>> longitude_2
>> latitude_2
>> ...
>>
>> That seems pretty ugly. Could netcdf4's "hierarchical data" help us here?
>> Maybe this provides the motivation to use it.
>>
>> Option two:
>>
>> put all the particles in one big array, but identify the different "spills"
>> by particle ID:
>>
>> ID_range_1 = 0-1000
>> ID_range_2 = 1000-2000
>> ...
>>
>> then they could get split up by the client software, if desired, or
>> the separate spills could be ignored, and it could all be treated as one.
>>
>> -- thoughts?
>>
>>
>> --
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax Seattle, WA  98115
>> (206) 526-6317   main reception
>>
>> [email protected]
>>
>
>
>
> --
> Dr. Richard P. Signell   (508) 457-2229
> USGS, 384 Woods Hole Rd.
> Woods Hole, MA 02543-1598
> _______________________________________________
> CF-metadata mailing list
> [email protected]
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to