Ken, I think I'm confused by the text of the proposed change to the definition of status_flag.
In your proposed change the "quality" wording of the status_flag definition was dropped. Here is the first sentence of each: Current: A variable with the standard name of status_flag contains an indication of quality or other status of another data variable. Proposed: A variable with the standard name of status_flag contains an indication of status of another data variable. Perhaps the following for "status_flag": A variable with the standard name of status_flag contains an indication of quality or other status of another data variable. The linkage between the data variable and the variable with the standard_name of status_flag is achieved using the ancillary_variables attribute. *A variable which contains purely subjective quality information may use the standard_name of quality_flag.* That is, keep the current definition, but also inform of a more restrictive option. I don't see any way around not reading the flag_meanings with any of these options. -Barna On Tue, Jul 23, 2019 at 1:03 PM Kehoe, Kenneth E. <kke...@ou.edu> wrote: > Barna, > > I see this as an optional addition to narrow the standard. It does not > prohibit someone from using status_flag (as a standard_name or a > standard_name modifier) from a previous convention version > implementation nor invalidate that use from a previous convention > version. In your example the use of status_flag is a mixture of state > and quality. I see this new name as a way to improve things going > forward. Since the historical WOCE example uses state and quality with > some additional rules not listed in the CF standard it would be up to > the user to understand how to use the variable. Without seeing the WOCE > data I can't make a specific suggestion. > > I don't know about any rules regarding a restriction. I think the > general concept of CF is to set the minimum rules. Additional rules > applied by another group on top of CF is allowed. For example my > organization uses additional attributes not defined in CF. I see > quality_flag as a narrowing of the rules of status_flag not replace it. > status_flag can still have a mixture of state and quality if the data > provider prefers to do it that way. quality_flag can only have quality > information. The determination of what is quality information is > actually up to the data provider to decide. > > Ken > > > > On 2019-7-23 13:33, Andrew Barna wrote: > > Ken, > > > > Ok I see how this can be useful. Two more questions: > > * How would you deal with "legacy" flag schemes which mix "status" and > > "quality" already? I'm thinking of WOCE CTD as an example where "7" > > means Despiked (a status) and "3" means Questionable measurement (a > > quality). The way my seagoing group have dealt with both is by having > > the "quality" override "status" if the quality is anything other than > > "good", e.g. a questionable measurement which has been despiked gets > > flag 3. > > > > * Are there rules in CF regarding restricting an existing definition? > > I imagine there are many datasets already using the "status_flag" name > > as either a stand alone standard name or a standard name modifier. > > This change seems to be "breaking" in that previously compliant > > datasets would now have quality information in a purely status field. > > > > Thanks > > -Barna > > > > On Tue, Jul 23, 2019 at 10:08 AM Kehoe, Kenneth E. <kke...@ou.edu> > wrote: > >> Martin, > >> > >> Thanks for your reply. I would prefer to keep the proposal simple. My > example of a weighted mean was just one I created off the top of my head. I > don't see it as something to actually look into implementing. > >> > >> I need a way to indicate a variable is a quality status field. The > distinction that the status field only contains quality information is the > important distinction. The variable indicated with quality_flag will need > to also use flag_meanings, same as status_flag. Hence my reason for > choosing quality_flag to follow a similar naming pattern. > >> > >> Barna, > >> > >> Without a distinction that the entire variable is a quality variable > the user is forced to parse the flag_meanings to see if the variable > applies. This would also encourage a data provider to mix quality with > source or instrument state or something else in the same variable. That > would be very difficult to understand. > >> > >> As Martin points out quality is more subjective than other status > information. A user may need to choose what parts of the quality variable > to apply. I would prefer we not conflate absolute information with > subjective information. But we need a way to distinguish the variable > contains absolute information vs a variable that contains more subjective > information. > >> > >> To expand on Martin's example imagine a profiling instrument that has a > shutter to protect the laser from rain. The laser will always send out > pulses and the receiver will always be on receiving the return from laser > pulse. To know when the shutter is in the open state where the instrument > is profiling we would use a state variable with a simple flag_values method. > >> > >> short shutter (time) > >> shutter:long_name = "Shutter state" > >> shutter:units = '1' > >> shutter:flag_values = 0, 1 > >> shutter:flag_meanings = "closed open" > >> shutter:standard_name = "status_flag" > >> > >> This variable is just indicating the position of the shutter. There is > no ambiguity with it's use. If a user wants to use the data for atmospheric > reasons they should filter to only use data where profiling. In fact we can > implement this variable into our code by only using data where shutter is > set to open. > >> > >> Here is an example of more subjective quality variable. > >> > >> short quality_variable (time) > >> quality_variable:long_name = "Quality variable for linked data > variable" > >> quality_variable:units = '1' > >> quality_variable:flag_masks= 1, 2, 4, 8, 16, 32 > >> quality_variable:flag_meanings = "Shutter_not_open > >> Laser_below_80_percent_power > >> Laser_below_60_percent_power > >> Laser_below_40_percent_power > >> Bird_poop_may_be_on_sensor > >> Bird_poop_is_on_sensor" > >> quality_variable:flag_meanings = "Bad Suspect Suspect Bad Suspect > Bad" > >> quality_variable:standard_name = "quality_flag" > >> > >> In this example there are three indications when the laser is less than > 100%. It would be up to the user to decide what percentage is the limit > where they do not want to use the data. This is more subjective and > dependent on the research techniques to determine if the issue a problem or > not. It is also up to the user to determine if the chance of bird poop on > the sensor is an issue or if they are OK with the risk of using the data. > And to be nice to the user we have also pulled in information from the > shutter variable so the user can decided to only use the quality_variable > instead of using both shutter and quality_variable. This is up to the data > provider to decide. Some providers see the state of the shutter as quality > information, some would not. There is no requirements put on the quality > variable as to how it is used. It is just a quality information variable > following the same rules as a CF state variable. > >> > >> I have also included an attribute that I am not currently proposing > called flag_assessment. This is a subjective statement from the data > provider on their opinion of the quality of the data. A user can search for > the word "Bad" and then exclude only that data from analysis where the > mask is set. This would take all the guess work of quality away from the > user if they decided to take the opinion of the data provider. I'm not > currently proposing the addition of flag_meanings, this is just an example > of how quality can be expanded to be more simple for a user but not take > away the user's ability to make their own decision. Everyone has strong > opinions on quality of data. > >> > >> Thanks, > >> > >> Ken > >> > >> On 2019-7-23 06:50, Martin Juckes - UKRI STFC wrote: > >> > >> Dear Ken, > >> > >> > >> thanks for your response to me below. > >> > >> > >> Would it be fair to suggest that "status" should, as far as possible, > reflect a generic objective classification, with terms such as > "sensor_nonfunctional" which have a comparable meaning for all datasets, > while "quality" is a subjective *measure* with a meaning that may from > dataset to dataset? E.g. if dataset A has a maximum "quality" of 11 and > dataset B only goes up to 10, it doesn't necessarily imply that dataset A > is in any sense better and B. > >> > >> > >> If you want to use it in weighted means, perhaps it should be > "quality_measure" rather than "quality_flag"? With "status_flag" the order > of integer values does not have any meaning, but with quality perhaps it > would make more sense have some concept of a sequence of quality settings > (so that, for example "1" always indicates a quality between "0" and "2" > within a dataset, but could have different meanings in different datasets). > Could the quality also be expressed as a floating point number without any > flag meanings? > >> > >> > >> Responding to a point Barna raised: it is certainly possible to have > more than one "status_flag" variable, but I don't think it is ideal: if > information needs to be split across multiple variables we generally like > to describe the difference between the variables in the standard name or in > other metadata. In this case, I think there is a good case for using a new > standard name. > >> > >> > >> regards, > >> > >> Martin > >> > >> > >> > >> > >> ________________________________ > >> From: CF-metadata <cf-metadata-boun...@cgd.ucar.edu> on behalf of > Andrew Barna <aba...@ucsd.edu> > >> Sent: 23 July 2019 00:23 > >> To: Kehoe, Kenneth E. > >> Cc: cf-metadata@cgd.ucar.edu > >> Subject: Re: [CF-metadata] New standard_name of quality_flag for > corresponding quality control variables > >> > >> Ken, > >> > >> I guess, I don't see this proposed change as necessary since the > >> distinction between the terms "quality" and "status" is really done in > >> the "flag_meanings" attribute and is basically free form/uncontrolled. > >> These attributes need to be used by this new name as well. > >> > >> Let me rephrase my suggestion/question: > >> If this proposal is not adopted, but an example of how to use a > >> variable, with the standard name of "status_flag", to only indicate > >> data quality is included in the document, would that help? > >> > >> -Barna > >> > >> On Mon, Jul 22, 2019 at 1:22 PM Kehoe, Kenneth E. <kke...@ou.edu> > wrote: > >> > >> Barna, > >> > >> Yes an update to the CF document should follow after the new > >> standard_name is implemented. I think multiple examples are needed since > >> status_flag covers many different types of state variables. > >> > >> Ken > >> > >> > >> > >> On 2019-7-22 10:35, Andrew Barna wrote: > >> > >> Hi Martin, Ken, > >> > >> Is there anything wrong with including multiple "status_flag" > >> variables to capture all separate state you wish? The CF document > >> unfortunately only includes an example of how to encode the status of > >> a sensor, but the actual meanings of the flag values are entirely up > >> to you, and this will not change with this proposal. Perhaps the CF > >> document would benefit from additional examples (e.g. one that only > >> shows data quality flags). > >> > >> -Barna > >> > >> > >> On Mon, Jul 22, 2019 at 9:04 AM Kehoe, Kenneth E. <kke...@ou.edu> > wrote: > >> > >> Hi Martin, > >> > >> I see status encompassing multiple metadata pieces of information. For > >> example it could be a state of the instrument as it cycles through a > >> pre-programed routine (Look at calibration target, look at sky, look at > >> ground, look at second calibration target, repeat...). Or the sources of > >> the inputs for a model where the availability or some other reason could > >> require making a decision on what source(s) to use. For provenance this > >> source information is important to report on a time step basis. Or the > >> status could be a data providers method to provide uncertainty > >> information (I see this as incorrect but some people do see it this > >> way). Each of these are important metadata but the method of use is > >> different than a strictly quality variable. A quality variable provides > >> information indicating if the data should be used or possibly could be > >> used in a weighted mean method to favor high quality data over low > >> quality data. The way the metadata is used is different depending on the > >> metadata type. A state of the instrument would be used for sub-setting > >> calibration vs. data. There is no ambiguity in this as data from a > >> calibration target is not used in a weather research analysis. But > >> quality is more subjective and is decided by the data user. If the > >> quality variable has 20 different quality tests the user would need to > >> decided if all 20 test results should be used or only a subset. Also, > >> the code for applying the quality is different than the state of the > >> instrument view (in my example above). > >> > >> It is possible to have a quality test result from the state of the > >> instrument, but not the other way around (typically). So I need a way to > >> distinguish the two for automated or semi-automated tools. Hence my > >> point of quality_flag essentially being a subset of status_flag > >> > >> Ken > >> > >> > >> > >> On 2019-7-22 02:57, Martin Juckes - UKRI STFC wrote: > >> > >> Dear Ken, > >> > >> > >> Can you expand on the distinction between "quality" and "status"? I > understand that they are different in principle, but, in order to support > this new standard name I think we need a clear objective statement of how > we would want to distinguish between them in CF. > >> > >> The conventions section on flags (3.5) mixes the two up ( > http://cfconventions.org/cf-conventions/cf-conventions.html#flags ), so > some re-wording of the document would also be needed, > >> > >> regards, > >> Martin > >> > >> ________________________________ > >> From: CF-metadata <cf-metadata-boun...@cgd.ucar.edu> on behalf of > Kehoe, Kenneth E. <kke...@ou.edu> > >> Sent: 19 July 2019 06:42 > >> To: cf-metadata@cgd.ucar.edu > >> Subject: [CF-metadata] New standard_name of quality_flag for > corresponding quality control variables > >> > >> Dear CF, > >> > >> I am proposing a new standard name of "quality_flag" to indicate a > variable is purely a quality control variable. A quality control variable > would use flag_values or flag_masks along with flag_meanings to allow > declaring levels of quality or results from quality indicating tests of the > data variable. This variable be a subset of the more general "status_flag" > standard name. Currently the definition of "status_flag" is: > >> > >> - A variable with the standard name of status_flag contains an > indication of quality or other status of another data variable. The linkage > between the data variable and the variable with the standard_name of > status_flag is achieved using the ancillary_variables attribute. > >> > >> This definition includes a variable used to define the state or other > status information of a variable and can not be distinguished by standard > name alone from a state of the instrument, processing decision, source > information, needed metadata about the data variable or other ancillary > variable type. Since there is no other way to define a purely quality > control variable, the use of "status_flag" is too general for strictly > quality control variables. By having a method to define a variable as > strictly quality control the results of quality control tests can be > applied to the data with a software tool based on requests by the user. > This would not affect current datasets that do use "status_flag" nor > require a change to the definition outside of the indication that > "quality_flag" standard name is available and a better use for pure quality > control variables. > >> > >> Proposed addition: > >> > >> quality_flag = A variable with the standard name of quality_flag > contains an indication of quality information of another data variable. The > linkage between the data variable and the variable or variables with the > standard_name of quality_flag is achieved using the ancillary_variables > attribute. > >> > >> Proposed change: > >> > >> status_flag = A variable with the standard name of status_flag contains > an indication of status of another data variable. The linkage between the > data variable and the variable with the standard_name of status_flag is > achieved using the ancillary_variables attribute. For data quality > information use quality_flag. > >> > >> Thanks, > >> > >> Ken > >> > >> > >> > >> -- > >> Kenneth E. Kehoe > >> Research Associate - University of Oklahoma > >> Cooperative Institute for Mesoscale Meteorological Studies > >> ARM Climate Research Facility - Data Quality Office > >> e-mail: kke...@ou.edu<mailto:kke...@ou.edu> | Office: 303-497-4754 > >> > >> -- > >> Kenneth E. Kehoe > >> Research Associate - University of Oklahoma > >> Cooperative Institute for Mesoscale Meteorological Studies > >> ARM Climate Research Facility - Data Quality Office > >> e-mail: kke...@ou.edu | Office: 303-497-4754 > >> > >> _______________________________________________ > >> CF-metadata mailing list > >> CF-metadata@cgd.ucar.edu > >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata > >> > >> -- > >> Kenneth E. Kehoe > >> Research Associate - University of Oklahoma > >> Cooperative Institute for Mesoscale Meteorological Studies > >> ARM Climate Research Facility - Data Quality Office > >> e-mail: kke...@ou.edu | Office: 303-497-4754 > >> > >> _______________________________________________ > >> CF-metadata mailing list > >> CF-metadata@cgd.ucar.edu > >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata > >> > >> _______________________________________________ > >> CF-metadata mailing list > >> CF-metadata@cgd.ucar.edu > >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata > >> > >> > >> -- > >> Kenneth E. Kehoe > >> Research Associate - University of Oklahoma > >> Cooperative Institute for Mesoscale Meteorological Studies > >> ARM Climate Research Facility - Data Quality Office > >> e-mail: kke...@ou.edu | Office: 303-497-4754 > >> > >> _______________________________________________ > >> CF-metadata mailing list > >> CF-metadata@cgd.ucar.edu > >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata > > -- > Kenneth E. Kehoe > Research Associate - University of Oklahoma > Cooperative Institute for Mesoscale Meteorological Studies > ARM Climate Research Facility - Data Quality Office > e-mail: kke...@ou.edu | Office: 303-497-4754 > > _______________________________________________ > CF-metadata mailing list > CF-metadata@cgd.ucar.edu > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata >
_______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata