Hi Stephen et al., I have rediscovered this conversation in the depths of my inbox and thought it might be worth resurrecting. To summarize the discussion so far:
1) I proposed two new standard optional CF attributes (called something like minimum_data_value and maximum_data_value) that would be attached to a variable in a NetCDF file, redundantly holding its actual min and max values in that file. As well as helping with data mining applications, this would act as a hint to visualization packages to provide a first attempt at automatically defining a colour scale range for sensible portrayal. Note that this attribute pair is distinct in purpose from valid_min and valid_max (which contains theoretical extrema, beyond which data is considered invalid). 2) Steve Hankin proposed a more sophisticated approach whereby there would be a min/max pair for each vertical level in the data. 3) Stephen Pascoe pointed out that spatial subsets will have different min and max values so the value of the simplistic approach is limited. Clearly both (2) and (3) are very valid points. But I still feel that in a "bang for buck" approach, the simplest approach (1) still has benefits. Even the simplest approach gives a sensible (if imperfect) range of values, without which a visualization application has no idea how to generate a colour scale, without extracting potentially-large quantities of data. Furthermore it is very easy for an NcML aggregation to look these min/max attributes in each file and generate the min/max for the aggregated dataset. I mentioned visualization as the primary use case for this proposal, but data mining apps can benefit too: it would be much quicker and easier to answer questions like, "which files contain temperatures above 30 degC"? Stephen mentioned cell_methods, which I'm not expert in. If this approach is to be adopted, do people think it would be best to express the min/max as two attributes or in cell_methods? Should this discussion be moved to the trac site? Jon On Sat, Mar 31, 2007 at 7:10 PM, Stephen Pascoe <[EMAIL PROTECTED]> wrote: > I would also like to express some reservations about the usefulness of > simple min/max attributes for the purpose John suggests (calculating > appropriate colourbar ranges in visualisations). My experience is that a > single pair of values is only relevant at a particular scale. Once you > start subsetting a domain there's a good chance the actual min/max will be > substantially different. > > For instance, taking an example from the IPCC data distribution centre, we > have a diurnal temperature range field with a min--max of ca. 1--40 deg_c. > However, half of this range is due to the variation over Greenland during > the winter. Subset anywhere else and the max is more like 20 deg_c. > Similarly, the maximum temperature field varies between ca. -50 and +45 > deg_c but most subselections in time or space only cover a fraction of this > range. > > There is no harm in having optional CF attributes for min and max but I'm > not convinced it will solve the problem. I like Steve's approach of > providing the extrema in auxiliary variables. In CF min/max can be > specified using the cell_methods attribute. What would be needed something > like Steve's "parent" attribute to specify two variables represent the same > field (with different cell_methods). > > Cheers, > Stephen. > > --- > Stephen Pascoe 01235 445980 > British Atmospheric Data Centre > Rutherford Appleton Laboratory, CCLRC > > > Steve Hankin wrote: >> >> Hi Jon, >> >> Can you really get away with simple attributes to contain the guidance on >> extrema? For example, if this is 3D data (has a Z axis) and you are >> interested in visualizations at different depths (heights), then the >> "recommended" contour ranges might well need to be different for each depth >> (illustrating why we have tended to back away from this problem for such a >> long time). >> >> Might it make sense to think more in terms of min/max values stored in new >> variables and identified by standard names. Here is a conceptual example >> for discussion (not a formal proposal, so please cut me slack): >> >> variables: >> float temperature(time,pres,lat,lon) ; >> float temp_min(pres) ; >> temp_min:parent = "temperature" ; >> temp_min:standard_name = "minimum_over_domain" ; >> float temp_max(pres) ; >> temp_max:parent = "temperature" ; >> temp_max:standard_name = "maximum_over_domain" ; >> >> This approach offers a lot more flexibility. Does the scope of the >> problem that needs to be solved require this flexibility? >> >> - Steve >> >> ================================================== >> >> Jon Blower wrote: >>> >>> Dear Jonathan, >>> >>> OK, that sounds fine too. How do we move forward to incorporate this >>> into the CF standard? >>> >>> Thanks, Jon >>> >>> On 3/28/07, Jonathan Gregory <[EMAIL PROTECTED]> wrote: >>> >>>> >>>> Dear Jon and Phil >>>> >>>> I'd suggest actual_min and actual_max, because they would complement the >>>> already defined (Unidata standard) valid_min and valid_max. >>>> >>>> Cheers >>>> >>>> Jonathan >>>> >>>> >>> >>> >>> >> >> -- >> Steve Hankin, NOAA/PMEL -- [EMAIL PROTECTED] >> 7600 Sand Point Way NE, Seattle, WA 98115-0070 >> ph. (206) 526-6080, FAX (206) 526-6744 >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> CF-metadata mailing list >> [email protected] >> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata >> > > -- -------------------------------------------------------------- Dr Jon Blower Tel: +44 118 378 5213 (direct line) Technical Director Tel: +44 118 378 8741 (ESSC) Reading e-Science Centre Fax: +44 118 378 6413 ESSC Email: [EMAIL PROTECTED] University of Reading 3 Earley Gate Reading RG6 6AL, UK -------------------------------------------------------------- _______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
