Re: [CF-metadata] 2. Re: [cf-satellite] Sharing quality flags among multiple variables (Jonathan Gregory)

Schultz, Martin Mon, 21 Nov 2011 02:56:19 -0800

Dear Jonathan,

      while it makes sense what you say, the lines are somewhat blurred and 
this is the philosophical fabric which makes it sometimes hard to communicate 
the usefulness of CF to others. It may be about time to begin thinking about 
CF-2.0 and initiate a discussion which should have simplicity as one major 
goal. There is to my knowledge (and there should be) no rush in this, but it 
may be worthwhile to begin to think about the future. Well, I am sure you have 
done this already! But what is the user involvement in this process? Should we 
think about a "CF conference", or maybe a somewhat larger scope "metadata 
conference"?


    But to get specific again: to me there is not much difference if you count 
(valid) observations over a given time interval or if you calculate a 
percentile or mean value. All of these operations aggregate information from 
the variable over time. This also means you always loose some 
detail/information. But where do you begin and end with this? A typical air 
quality measurement may be done every minute. What is archived are often the 
hourly data which have already been processed and averaged, and you will 
(hopefully) find at least some information about this in the metadata, at least 
a simple data quality flag that says if a given hourly value is ok or not. Then 
you can process, for example monthly mean values for one given year or monthly 
mean values over a "climatology" period (i.e. all "January" values from 1980 to 
present). Parallel to these mean values you may want to know the number of the 
obs entering this mean value, and the percentiles or the standard deviation. T
 hen you create a regional mean value where you combine data from different 
stations in a certain geographical domain. Again, all of these operations 
aggregate data and eliminate or reduce at least one dimension. In my view the 
"modifier" case where an observation count applies to several variables is just 
a special case, where you actually have various obs coming from the same 
instrument. In general, even if instruments are operated in parallel you will 
encounter failure of one measurement at other times than failure of another 
measurement. So, the general case is that obs are independent of each other. 
Therefore, I would argue that the "synchronizing" of obs is a special thing and 
should be treated separately from the statistical treatment of variables. 
Hence, the default in the sea water temperature and salinity case would be that 
each variable has its own "count" (via cell_methods?). One could then define a 
way to create this link via some sort of cross-referencing.

Cheers,

Martin


> -----Original Message-----
> From: Jonathan Gregory [mailto:[email protected]]
> Sent: Friday, November 18, 2011 4:43 PM
> To: Schultz, Martin
> Cc: [email protected]
> Subject: Re: [CF-metadata] 2. Re: [cf-satellite] Sharing quality flags among
> multiple variables (Jonathan Gregory)
>
> Dear Martin
>
> >        what is the difference between a mean value and an observation
> > count? You may add the 25th percentile to this list as well. As far as
> > I can tell, the cell_methods attribute should be best suited for all
> > of these and I don't see a need to work with standard_name modifiers
>
> Though this has not been thoroughly debated, I think the reasons why there
> are these two different mechanisms are that the two functions are
> distinguished like this:
>
> * cell_methods represents subgrid variation. They always imply that the data
> variable formerly had a higher dimensionality or a higher resolution, and they
> refer to one or more dimensions of the data on which the reduction or
> collapse was done. The relationships indicated by standard_name modifiers
> do not refer to particular dimensions of the data.
>
> * The operations cell_methods records are done on the data in the variable
> itself. Ancillary variables, described by standard_name modifiers, are extra
> information about the data in the variable. This cannot be inferred from the
> data; they are metadata, really, not a statistical reduction of data.
>
> However, I agree there's a similarity. In particular, both of them were
> motivated by a desire to avoid proliferation of standard_names because of
> the need to describe very common operations that could be applied to
> anything, and both of them could modify the units.
>
> Best wishes
>
> Jonathan

------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] 2. Re: [cf-satellite] Sharing quality flags among multiple variables (Jonathan Gregory)

Reply via email to