Dear Barna,

I don't think legacy schemes that notoriously mix quality statements with other 
information are a problem. They would simply be labelled 'status_flag'. 
'quality_flag' would be reserved for schemes with cleaner semantics. My 
understanding of the proposal does not change the meaning of 'status_flag' to 
exclude flag schemes with some quality flags.

Cheers, Roy.


I have now retired but will continue to be active through an Emeritus 
Fellowship using this e-mail address.

________________________________
From: CF-metadata <cf-metadata-boun...@cgd.ucar.edu> on behalf of Andrew Barna 
<aba...@ucsd.edu>
Sent: 23 July 2019 20:33
To: Kehoe, Kenneth E. <kke...@ou.edu>
Cc: cf-metadata@cgd.ucar.edu <cf-metadata@cgd.ucar.edu>
Subject: Re: [CF-metadata] New standard_name of quality_flag for corresponding 
quality control variables

Ken,

Ok I see how this can be useful. Two more questions:
* How would you deal with "legacy" flag schemes which mix "status" and
"quality" already? I'm thinking of WOCE CTD as an example where "7"
means Despiked (a status) and "3" means Questionable measurement (a
quality). The way my seagoing group have dealt with both is by having
the "quality" override "status" if the quality is anything other than
"good", e.g. a questionable measurement which has been despiked gets
flag 3.

* Are there rules in CF regarding restricting an existing definition?
I imagine there are many datasets already using the "status_flag" name
as either a stand alone standard name or a standard name modifier.
This change seems to be "breaking" in that previously compliant
datasets would now have quality information in a purely status field.

Thanks
-Barna

On Tue, Jul 23, 2019 at 10:08 AM Kehoe, Kenneth E. <kke...@ou.edu> wrote:
>
> Martin,
>
> Thanks for your reply. I would prefer to keep the proposal simple. My example 
> of a weighted mean was just one I created off the top of my head. I don't see 
> it as something to actually look into implementing.
>
> I need a way to indicate a variable is a quality status field. The 
> distinction that the status field only contains quality information is the 
> important distinction. The variable indicated with quality_flag will need to 
> also use flag_meanings, same as status_flag. Hence my reason for choosing 
> quality_flag to follow a similar naming pattern.
>
> Barna,
>
> Without a distinction that the entire variable is a quality variable the user 
> is forced to parse the flag_meanings to see if the variable applies. This 
> would also encourage a data provider to mix quality with source or instrument 
> state or something else in the same variable. That would be very difficult to 
> understand.
>
> As Martin points out quality is more subjective than other status 
> information. A user may need to choose what parts of the quality variable to 
> apply. I would prefer we not conflate absolute information with subjective 
> information. But we need a way to distinguish the variable contains absolute 
> information vs a variable that contains more subjective information.
>
> To expand on Martin's example imagine a profiling instrument that has a 
> shutter to protect the laser from rain. The laser will always send out pulses 
> and the receiver will always be on receiving the return from laser pulse. To 
> know when the shutter is in the open state where the instrument is profiling 
> we would use a state variable with a simple flag_values method.
>
> short shutter (time)
>   shutter:long_name = "Shutter state"
>   shutter:units = '1'
>   shutter:flag_values = 0, 1
>   shutter:flag_meanings = "closed open"
>   shutter:standard_name = "status_flag"
>
> This variable is just indicating the position of the shutter. There is no 
> ambiguity with it's use. If a user wants to use the data for atmospheric 
> reasons they should filter to only use data where profiling. In fact we can 
> implement this variable into our code by only using data where shutter is set 
> to open.
>
> Here is an example of more subjective quality variable.
>
> short quality_variable (time)
>   quality_variable:long_name = "Quality variable for linked data variable"
>   quality_variable:units = '1'
>   quality_variable:flag_masks= 1, 2, 4, 8, 16, 32
>   quality_variable:flag_meanings = "Shutter_not_open
>     Laser_below_80_percent_power
>     Laser_below_60_percent_power
>     Laser_below_40_percent_power
>     Bird_poop_may_be_on_sensor
>     Bird_poop_is_on_sensor"
>   quality_variable:flag_meanings = "Bad Suspect Suspect Bad Suspect Bad"
>   quality_variable:standard_name = "quality_flag"
>
> In this example there are three indications when the laser is less than 100%. 
> It would be up to the user to decide what percentage is the limit where they 
> do not want to use the data. This is more subjective and dependent on the 
> research techniques to determine if the issue a problem or not. It is also up 
> to the user to determine if the chance of bird poop on the sensor is an issue 
> or if they are OK with the risk of using the data. And to be nice to the user 
> we have also pulled in information from the shutter variable so the user can 
> decided to only use the quality_variable instead of using both shutter and 
> quality_variable. This is up to the data provider to decide. Some providers 
> see the state of the shutter as quality information, some would not. There is 
> no requirements put on the quality variable as to how it is used. It is just 
> a quality information variable following the same rules as a CF state 
> variable.
>
> I have also included an attribute that I am not currently proposing called 
> flag_assessment. This is a subjective statement from the data provider on 
> their opinion of the quality of the data. A user can search for the word 
> "Bad"  and then exclude only that data from analysis where the mask is set. 
> This would take all the guess work of quality away from the user if they 
> decided to take the opinion of the data provider. I'm not currently proposing 
> the addition of flag_meanings, this is just an example of how quality can be 
> expanded to be more simple for a user but not take away the user's ability to 
> make their own decision. Everyone has strong opinions on quality of data.
>
> Thanks,
>
> Ken
>
> On 2019-7-23 06:50, Martin Juckes - UKRI STFC wrote:
>
> Dear Ken,
>
>
> thanks for your response to me below.
>
>
> Would it be fair to suggest that "status" should, as far as possible, reflect 
> a generic objective classification, with terms such as "sensor_nonfunctional" 
> which have a comparable meaning for all datasets, while "quality" is a 
> subjective *measure* with a meaning that may from dataset to dataset? E.g. if 
> dataset A has a maximum "quality" of 11 and dataset B only goes up to 10, it 
> doesn't necessarily imply that dataset A is in any sense better and B.
>
>
> If you want to use it in weighted means, perhaps it should be 
> "quality_measure" rather than "quality_flag"? With "status_flag" the order of 
> integer values does not have any meaning, but with quality perhaps it would 
> make more sense have some concept of a sequence of quality settings (so that, 
> for example "1" always indicates a quality between "0" and "2" within a 
> dataset, but could have different meanings in different datasets). Could the 
> quality also be expressed as a floating point number without any flag 
> meanings?
>
>
> Responding to a point Barna raised: it is certainly possible to have more 
> than one "status_flag" variable, but I don't think it is ideal: if 
> information needs to be split across multiple variables we generally like to 
> describe the difference between the variables in the standard name or in 
> other metadata. In this case, I think there is a good case for using a new 
> standard name.
>
>
> regards,
>
> Martin
>
>
>
>
> ________________________________
> From: CF-metadata <cf-metadata-boun...@cgd.ucar.edu> on behalf of Andrew 
> Barna <aba...@ucsd.edu>
> Sent: 23 July 2019 00:23
> To: Kehoe, Kenneth E.
> Cc: cf-metadata@cgd.ucar.edu
> Subject: Re: [CF-metadata] New standard_name of quality_flag for 
> corresponding quality control variables
>
> Ken,
>
> I guess, I don't see this proposed change as necessary since the
> distinction between the terms "quality" and "status" is really done in
> the "flag_meanings" attribute and is basically free form/uncontrolled.
> These attributes need to be used by this new name as well.
>
> Let me rephrase my suggestion/question:
> If this proposal is not adopted, but an example of how to use a
> variable, with the standard name of "status_flag", to only indicate
> data quality is included in the document, would that help?
>
> -Barna
>
> On Mon, Jul 22, 2019 at 1:22 PM Kehoe, Kenneth E. <kke...@ou.edu> wrote:
>
> Barna,
>
> Yes an update to the CF document should follow after the new
> standard_name is implemented. I think multiple examples are needed since
> status_flag covers many different types of state variables.
>
> Ken
>
>
>
> On 2019-7-22 10:35, Andrew Barna wrote:
>
> Hi Martin, Ken,
>
> Is there anything wrong with including multiple "status_flag"
> variables to capture all separate state you wish? The CF document
> unfortunately only includes an example of how to encode the status of
> a sensor, but the actual meanings of the flag values are entirely up
> to you, and this will not change with this proposal. Perhaps the CF
> document would benefit from additional examples (e.g. one that only
> shows data quality flags).
>
> -Barna
>
>
> On Mon, Jul 22, 2019 at 9:04 AM Kehoe, Kenneth E. <kke...@ou.edu> wrote:
>
> Hi Martin,
>
> I see status encompassing multiple metadata pieces of information. For
> example it could be a state of the instrument as it cycles through a
> pre-programed routine (Look at calibration target, look at sky, look at
> ground, look at second calibration target, repeat...). Or the sources of
> the inputs for a model where the availability or some other reason could
> require making a decision on what source(s) to use. For provenance this
> source information is important to report on a time step basis. Or the
> status could be a data providers method to provide uncertainty
> information (I see this as incorrect but some people do see it this
> way). Each of these are important metadata but the method of use is
> different than a strictly quality variable. A quality variable provides
> information indicating if the data should be used or possibly could be
> used in a weighted mean method to favor high quality data over low
> quality data. The way the metadata is used is different depending on the
> metadata type. A state of the instrument would be used for sub-setting
> calibration vs. data. There is no ambiguity in this as data from a
> calibration target is not used in a weather research analysis. But
> quality is more subjective and is decided by the data user. If the
> quality variable has 20 different quality tests the user would need to
> decided if all 20 test results should be used or only a subset. Also,
> the code for applying the quality is different than the state of the
> instrument view (in my example above).
>
> It is possible to have a quality test result from the state of the
> instrument, but not the other way around (typically). So I need a way to
> distinguish the two for automated or semi-automated tools. Hence my
> point of quality_flag essentially being a subset of status_flag
>
> Ken
>
>
>
> On 2019-7-22 02:57, Martin Juckes - UKRI STFC wrote:
>
> Dear Ken,
>
>
> Can you expand on the distinction between "quality" and "status"? I 
> understand that they are different in principle, but, in order to support 
> this new standard name I think we need a clear objective statement of how we 
> would want to distinguish between them in CF.
>
> The conventions section on flags (3.5) mixes the two up 
> (http://cfconventions.org/cf-conventions/cf-conventions.html#flags ), so some 
> re-wording of the document would also be needed,
>
> regards,
> Martin
>
> ________________________________
> From: CF-metadata <cf-metadata-boun...@cgd.ucar.edu> on behalf of Kehoe, 
> Kenneth E. <kke...@ou.edu>
> Sent: 19 July 2019 06:42
> To: cf-metadata@cgd.ucar.edu
> Subject: [CF-metadata] New standard_name of quality_flag for corresponding 
> quality control variables
>
> Dear CF,
>
> I am proposing a new standard name of "quality_flag" to indicate a variable 
> is purely a quality control variable. A quality control variable would use 
> flag_values or flag_masks along with flag_meanings to allow declaring levels 
> of quality or results from quality indicating tests of the data variable. 
> This variable be a subset of the more general "status_flag" standard name. 
> Currently the definition of "status_flag" is:
>
> - A variable with the standard name of status_flag contains an indication of 
> quality or other status of another data variable. The linkage between the 
> data variable and the variable with the standard_name of status_flag is 
> achieved using the ancillary_variables attribute.
>
> This definition includes a variable used to define the state or other status 
> information of a variable and can not be distinguished by standard name alone 
> from a state of the instrument, processing decision, source information, 
> needed metadata about the data variable or other ancillary variable type. 
> Since there is no other way to define a purely quality control variable, the 
> use of "status_flag" is too general for strictly quality control variables. 
> By having a method to define a variable as strictly quality control the 
> results of quality control tests can be applied to the data with a software 
> tool based on requests by the user. This would not affect current datasets 
> that do use "status_flag" nor require a change to the definition outside of 
> the indication that "quality_flag" standard name is available and a better 
> use for pure quality control variables.
>
> Proposed addition:
>
> quality_flag = A variable with the standard name of quality_flag contains an 
> indication of quality information of another data variable. The linkage 
> between the data variable and the variable or variables with the 
> standard_name of quality_flag is achieved using the ancillary_variables 
> attribute.
>
> Proposed change:
>
> status_flag = A variable with the standard name of status_flag contains an 
> indication of status of another data variable. The linkage between the data 
> variable and the variable with the standard_name of status_flag is achieved 
> using the ancillary_variables attribute. For data quality information use 
> quality_flag.
>
> Thanks,
>
> Ken
>
>
>
> --
> Kenneth E. Kehoe
>     Research Associate - University of Oklahoma
>     Cooperative Institute for Mesoscale Meteorological Studies
>     ARM Climate Research Facility - Data Quality Office
>     e-mail: kke...@ou.edu<mailto:kke...@ou.edu> | Office: 303-497-4754
>
> --
> Kenneth E. Kehoe
>     Research Associate - University of Oklahoma
>     Cooperative Institute for Mesoscale Meteorological Studies
>     ARM Climate Research Facility - Data Quality Office
>     e-mail: kke...@ou.edu | Office: 303-497-4754
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
> --
> Kenneth E. Kehoe
>    Research Associate - University of Oklahoma
>    Cooperative Institute for Mesoscale Meteorological Studies
>    ARM Climate Research Facility - Data Quality Office
>    e-mail: kke...@ou.edu | Office: 303-497-4754
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
> --
> Kenneth E. Kehoe
>   Research Associate - University of Oklahoma
>   Cooperative Institute for Mesoscale Meteorological Studies
>   ARM Climate Research Facility - Data Quality Office
>   e-mail: kke...@ou.edu | Office: 303-497-4754
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


This email and any attachments are intended solely for the use of the named 
recipients. If you are not the intended recipient you must not use, disclose, 
copy or distribute this email or any of its attachments and should notify the 
sender immediately and delete this email from your system.
UK Research and Innovation has taken every reasonable precaution to minimise 
risk of this email or any attachments containing viruses or malware but the 
recipient should carry out its own virus and malware checks before opening the 
attachments. UK Research and Innovation does not accept any liability for any 
losses or damages which the recipient may sustain due to presence of any 
viruses.
Opinions, conclusions or other information in this message and attachments that 
are not related directly to UK Research and Innovation business are solely 
those of the author and do not represent the views of UK Research and 
Innovation.

_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to