... I should have read this first. Many of the same points! Bryan
On 19 September 2013 22:58, <[email protected]> wrote: > Hi Charlie, > > very good and extensive explanation of the potential use for groups > and group-aware metadata. Yet, I have a few remarks (which may in part > reveal that I should probably read the preamble of the CF convention again > ;-): > > > Point 1: How does the user know she has all the realizations? > > Is this question best addressed with metadata in a (series of) file(s)? In > a modern, interoperable architecture, I would think that this belongs into > the realm of data discovery, which would be done via web catalogues using > metadata facets. File-based metadata IMHO may be more prone to failure. > Just imagine, ECMWF had first generated two ensemble members, and their > metadata would say so (your "page 2/2" analogy). Now they run another two: > do you really expect the metadata from the old files to be updated? A web > catalogue would provide a more robust solution to this question, I believe. > > This doesn't mean that it may not be useful to have such information in a > file! However, to come back to the suitcases: this can only be a packing > list for the current trip and not an inventory of all the socks you may > possibly own. Of course your young aspiring researcher may wish to express > her knowledge about other ensemble members she found on the web but didn't > include in the file (the suitcase). But will her supervisor or colleague on > the other side of the world understand what she is talking about? I think, > if you intend to go beyond the packing list, you open too many cans with > too many worms. > > > Point 2: Multiple and/or Non-numeric Ensemble axes > > Here, you have a valid point, although - again - I would not connect this > to knowing " that she has all the models". Yet, within the packed file (the > suitcase) you want to know which hierarchy model (packing order) was > applied in order to be able to aggregate things (for example by computing > ensemble averages). See also my use-case on aircraft data introduced below. > Question: what happens to this kind of information when the files are > flattened and re-packaged? It might well become meaningless, which would > indicate that these are "temporary" metadata, and thus probably out of > scope for CF. This actually reminds me a bit of my experiences with the > history attribute when I use ncks -A. This command will preserve the > history of one file, but discard the history of the other file, which is > certainly not the behavior you would like to see in ungrouping/re-grouping > software. > > > Point 3: Weights and intentional reproducibility of MME statistics > > In my view this is actually just another viewing angle on your point #2. > > -- > > Your use-case does however highlight the "convenience" of grouping data > which somehow belong to each other into one file. In a world of flat files, > one must check coordinates each time when you want to perform some sort of > (ensemble) averaging operation. A hierarchical file will tell you that it > is OK to average by placing the common coordinates on the upper level. > IMPORTANT: again, this doesn't mean that this is the only or best way to do > the grouping - yet, it seems a compelling advantage to have this > coordinate-consistency problem eliminated somewhere along your processing > steps. As others said already: there are reasons for why people use > suitcases. > > -- > > Now, here is another use case, which we haven't implemented yet - partly > because we didn't see how it can be done in a CF consistent way: > While there has been a definition of a standard file layout for data from > multiple stations (a contribution from Ben Domenico and Stefano Nativi if I > am not mistaken), this concept cannot be applied to multiple aircraft > flight data. The station data can be packaged together with help of a > non-geophysical "station" coordinate, because all stations share the same > time axis. With aircraft flights, the time axes often don't overlap, and > forcing all data onto the superset of time would be a tremendous waste of > space. Groups would seem as the natural solution to this problem! Why not > flat files? Because you might wish to retrieve all the aircraft data which > were sampled in a given region during a specific period (a natural use case > for a catalogue query it seems) in one entity, and not in N entities, where > you cannot even predict N. > > I would think the same applies to "granules" of satellite data which share > a common calibration, for example. > > -- > > As Nan said, we should try to come back to define what is really at stake > for CF and what exactly shall be proposed. Now this is where my failure to > re-read the convention preamble may show ;-). The main question is: is CF > about files or about interoperability? Unfortunately, my view on this is > not entirely clear, because it seems to be a bit of both. The > standard_names clearly have a bearing in the interoperable world, and this > shows through various links to the CF standard_names in web catalogues or > controlled vocabulary collections (e.g. SeaDataNet). The conventions > themselves seem to be more file-oriented - even though the discussions > about the data model always make a strong point to go beyond representation > in a (single) file. [If someone disagrees and wishes to see the CF > convention play a more important role in interoperability, then I would ask > why it is not cast into an XML schema extending ISO19115 then. ] If CF is > indeed "file-oriented", then I do think that it makes a lot of sense to > support "modern" file structures, which include groups and hierarchies, > whether we like them or not. Therefore, I would advocate that we focus the > discussion on two major points with a couple of sub-issues: > > 1. which parts of CF might fail when we have a hierarchical file? (and > let's stick to the simple inheritance model of netcdf4 for now!) > 1a. what would the current CF checker say if it is fed a hierarchical file? > 1b. what happens to global attributes when flat files are grouped together? > 1c. do we need to re-phrase some aspects of the convention to make them > "group-aware"? (this does not include defining new rules - that is covered > in point 2) > 1d. anything else? > > 2. where do we need to extend the current CF concept? > 2a. introduction of a new attribute "level" (equate "global" with "root"? > What happens when hierarchical files are flattened? [please see the 3 > varieties of flattening operations mentioned in an earlier post]) > 2b. specification of "ensemble_..." attributes? "ensemble_axis" may not be > needed of these axes are defined on the group level (?) Something like > "ensemble_history" or "ensemble_structure" to inform the user about the > grouping principle? > 2c. what other "relations" need to be expressed within a hierarchical > file? The guiding principle here should be that additional rules are only > needed if they avoid ambiguity and misinterpretation of the data. And here > we get onto interoperability territory again (see my use case about > aircraft data above). > > > Sorry for this long post -- this just somehow seems to be quite relevant! > > Best regards, > > Martin > > > > -------------------------------------------------------------------------------- > PD Dr. Martin G. Schultz > IEK-8, Forschungszentrum Jülich > D-52425 Jülich > Ph: +49 2461 61 2831 > > > > > > ------------------------------------------------------------------------------------------------ > > ------------------------------------------------------------------------------------------------ > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher > Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, > Prof. Dr. Sebastian M. Schmidt > > ------------------------------------------------------------------------------------------------ > > ------------------------------------------------------------------------------------------------ > > Das Forschungszentrum oeffnet seine Tueren am Sonntag, 29. September, von > 10:00 bis 17:00 Uhr: http://www.tagderneugier.de > _______________________________________________ > CF-metadata mailing list > [email protected] > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata > -- > Scanned by iCritical. > -- Bryan Lawrence University of Reading: Professor of Weather and Climate Computing. National Centre for Atmospheric Science: Director of Models and Data. STFC: Director of the Centre for Environmental Data Archival. Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence
_______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
