Martin,
Thanks for your reply. I would prefer to keep the proposal simple. My example
of a weighted mean was just one I created off the top of my head. I don't see
it as something to actually look into implementing.
I need a way to indicate a variable is a quality status field. The distinction
that the status field only contains quality information is the important
distinction. The variable indicated with quality_flag will need to also use
flag_meanings, same as status_flag. Hence my reason for choosing quality_flag
to follow a similar naming pattern.
Barna,
Without a distinction that the entire variable is a quality variable the user
is forced to parse the flag_meanings to see if the variable applies. This would
also encourage a data provider to mix quality with source or instrument state
or something else in the same variable. That would be very difficult to
understand.
As Martin points out quality is more subjective than other status information.
A user may need to choose what parts of the quality variable to apply. I would
prefer we not conflate absolute information with subjective information. But we
need a way to distinguish the variable contains absolute information vs a
variable that contains more subjective information.
To expand on Martin's example imagine a profiling instrument that has a shutter
to protect the laser from rain. The laser will always send out pulses and the
receiver will always be on receiving the return from laser pulse. To know when
the shutter is in the open state where the instrument is profiling we would use
a state variable with a simple flag_values method.
short shutter (time)
shutter:long_name = "Shutter state"
shutter:units = '1'
shutter:flag_values = 0, 1
shutter:flag_meanings = "closed open"
shutter:standard_name = "status_flag"
This variable is just indicating the position of the shutter. There is no
ambiguity with it's use. If a user wants to use the data for atmospheric
reasons they should filter to only use data where profiling. In fact we can
implement this variable into our code by only using data where shutter is set
to open.
Here is an example of more subjective quality variable.
short quality_variable (time)
quality_variable:long_name = "Quality variable for linked data variable"
quality_variable:units = '1'
quality_variable:flag_masks= 1, 2, 4, 8, 16, 32
quality_variable:flag_meanings = "Shutter_not_open
Laser_below_80_percent_power
Laser_below_60_percent_power
Laser_below_40_percent_power
Bird_poop_may_be_on_sensor
Bird_poop_is_on_sensor"
quality_variable:flag_meanings = "Bad Suspect Suspect Bad Suspect Bad"
quality_variable:standard_name = "quality_flag"
In this example there are three indications when the laser is less than 100%.
It would be up to the user to decide what percentage is the limit where they do
not want to use the data. This is more subjective and dependent on the research
techniques to determine if the issue a problem or not. It is also up to the
user to determine if the chance of bird poop on the sensor is an issue or if
they are OK with the risk of using the data. And to be nice to the user we have
also pulled in information from the shutter variable so the user can decided to
only use the quality_variable instead of using both shutter and
quality_variable. This is up to the data provider to decide. Some providers see
the state of the shutter as quality information, some would not. There is no
requirements put on the quality variable as to how it is used. It is just a
quality information variable following the same rules as a CF state variable.
I have also included an attribute that I am not currently proposing called
flag_assessment. This is a subjective statement from the data provider on their
opinion of the quality of the data. A user can search for the word "Bad" and
then exclude only that data from analysis where the mask is set. This would
take all the guess work of quality away from the user if they decided to take
the opinion of the data provider. I'm not currently proposing the addition of
flag_meanings, this is just an example of how quality can be expanded to be
more simple for a user but not take away the user's ability to make their own
decision. Everyone has strong opinions on quality of data.
Thanks,
Ken
On 2019-7-23 06:50, Martin Juckes - UKRI STFC wrote:
Dear Ken,
thanks for your response to me below.
Would it be fair to suggest that "status" should, as far as possible, reflect a
generic objective classification, with terms such as "sensor_nonfunctional"
which have a comparable meaning for all datasets, while "quality" is a
subjective *measure* with a meaning that may from dataset to dataset? E.g. if
dataset A has a maximum "quality" of 11 and dataset B only goes up to 10, it
doesn't necessarily imply that dataset A is in any sense better and B.
If you want to use it in weighted means, perhaps it should be "quality_measure"
rather than "quality_flag"? With "status_flag" the order of integer values does
not have any meaning, but with quality perhaps it would make more sense have
some concept of a sequence of quality settings (so that, for example "1" always
indicates a quality between "0" and "2" within a dataset, but could have
different meanings in different datasets). Could the quality also be expressed
as a floating point number without any flag meanings?
Responding to a point Barna raised: it is certainly possible to have more than
one "status_flag" variable, but I don't think it is ideal: if information needs
to be split across multiple variables we generally like to describe the
difference between the variables in the standard name or in other metadata. In
this case, I think there is a good case for using a new standard name.
regards,
Martin
________________________________
From: CF-metadata
<[email protected]><mailto:[email protected]> on
behalf of Andrew Barna <[email protected]><mailto:[email protected]>
Sent: 23 July 2019 00:23
To: Kehoe, Kenneth E.
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [CF-metadata] New standard_name of quality_flag for corresponding
quality control variables
Ken,
I guess, I don't see this proposed change as necessary since the
distinction between the terms "quality" and "status" is really done in
the "flag_meanings" attribute and is basically free form/uncontrolled.
These attributes need to be used by this new name as well.
Let me rephrase my suggestion/question:
If this proposal is not adopted, but an example of how to use a
variable, with the standard name of "status_flag", to only indicate
data quality is included in the document, would that help?
-Barna
On Mon, Jul 22, 2019 at 1:22 PM Kehoe, Kenneth E.
<[email protected]><mailto:[email protected]> wrote:
Barna,
Yes an update to the CF document should follow after the new
standard_name is implemented. I think multiple examples are needed since
status_flag covers many different types of state variables.
Ken
On 2019-7-22 10:35, Andrew Barna wrote:
Hi Martin, Ken,
Is there anything wrong with including multiple "status_flag"
variables to capture all separate state you wish? The CF document
unfortunately only includes an example of how to encode the status of
a sensor, but the actual meanings of the flag values are entirely up
to you, and this will not change with this proposal. Perhaps the CF
document would benefit from additional examples (e.g. one that only
shows data quality flags).
-Barna
On Mon, Jul 22, 2019 at 9:04 AM Kehoe, Kenneth E.
<[email protected]><mailto:[email protected]> wrote:
Hi Martin,
I see status encompassing multiple metadata pieces of information. For
example it could be a state of the instrument as it cycles through a
pre-programed routine (Look at calibration target, look at sky, look at
ground, look at second calibration target, repeat...). Or the sources of
the inputs for a model where the availability or some other reason could
require making a decision on what source(s) to use. For provenance this
source information is important to report on a time step basis. Or the
status could be a data providers method to provide uncertainty
information (I see this as incorrect but some people do see it this
way). Each of these are important metadata but the method of use is
different than a strictly quality variable. A quality variable provides
information indicating if the data should be used or possibly could be
used in a weighted mean method to favor high quality data over low
quality data. The way the metadata is used is different depending on the
metadata type. A state of the instrument would be used for sub-setting
calibration vs. data. There is no ambiguity in this as data from a
calibration target is not used in a weather research analysis. But
quality is more subjective and is decided by the data user. If the
quality variable has 20 different quality tests the user would need to
decided if all 20 test results should be used or only a subset. Also,
the code for applying the quality is different than the state of the
instrument view (in my example above).
It is possible to have a quality test result from the state of the
instrument, but not the other way around (typically). So I need a way to
distinguish the two for automated or semi-automated tools. Hence my
point of quality_flag essentially being a subset of status_flag
Ken
On 2019-7-22 02:57, Martin Juckes - UKRI STFC wrote:
Dear Ken,
Can you expand on the distinction between "quality" and "status"? I understand
that they are different in principle, but, in order to support this new
standard name I think we need a clear objective statement of how we would want
to distinguish between them in CF.
The conventions section on flags (3.5) mixes the two up
(http://cfconventions.org/cf-conventions/cf-conventions.html#flags ), so some
re-wording of the document would also be needed,
regards,
Martin
________________________________
From: CF-metadata
<[email protected]><mailto:[email protected]> on
behalf of Kehoe, Kenneth E. <[email protected]><mailto:[email protected]>
Sent: 19 July 2019 06:42
To: [email protected]<mailto:[email protected]>
Subject: [CF-metadata] New standard_name of quality_flag for corresponding
quality control variables
Dear CF,
I am proposing a new standard name of "quality_flag" to indicate a variable is
purely a quality control variable. A quality control variable would use
flag_values or flag_masks along with flag_meanings to allow declaring levels of
quality or results from quality indicating tests of the data variable. This
variable be a subset of the more general "status_flag" standard name. Currently
the definition of "status_flag" is:
- A variable with the standard name of status_flag contains an indication of
quality or other status of another data variable. The linkage between the data
variable and the variable with the standard_name of status_flag is achieved
using the ancillary_variables attribute.
This definition includes a variable used to define the state or other status
information of a variable and can not be distinguished by standard name alone
from a state of the instrument, processing decision, source information, needed
metadata about the data variable or other ancillary variable type. Since there
is no other way to define a purely quality control variable, the use of
"status_flag" is too general for strictly quality control variables. By having
a method to define a variable as strictly quality control the results of
quality control tests can be applied to the data with a software tool based on
requests by the user. This would not affect current datasets that do use
"status_flag" nor require a change to the definition outside of the indication
that "quality_flag" standard name is available and a better use for pure
quality control variables.
Proposed addition:
quality_flag = A variable with the standard name of quality_flag contains an
indication of quality information of another data variable. The linkage between
the data variable and the variable or variables with the standard_name of
quality_flag is achieved using the ancillary_variables attribute.
Proposed change:
status_flag = A variable with the standard name of status_flag contains an
indication of status of another data variable. The linkage between the data
variable and the variable with the standard_name of status_flag is achieved
using the ancillary_variables attribute. For data quality information use
quality_flag.
Thanks,
Ken
--
Kenneth E. Kehoe
Research Associate - University of Oklahoma
Cooperative Institute for Mesoscale Meteorological Studies
ARM Climate Research Facility - Data Quality Office
e-mail:
[email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
| Office: 303-497-4754
--
Kenneth E. Kehoe
Research Associate - University of Oklahoma
Cooperative Institute for Mesoscale Meteorological Studies
ARM Climate Research Facility - Data Quality Office
e-mail: [email protected]<mailto:[email protected]> | Office: 303-497-4754
_______________________________________________
CF-metadata mailing list
[email protected]<mailto:[email protected]>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
Kenneth E. Kehoe
Research Associate - University of Oklahoma
Cooperative Institute for Mesoscale Meteorological Studies
ARM Climate Research Facility - Data Quality Office
e-mail: [email protected]<mailto:[email protected]> | Office: 303-497-4754
_______________________________________________
CF-metadata mailing list
[email protected]<mailto:[email protected]>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
[email protected]<mailto:[email protected]>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
Kenneth E. Kehoe
Research Associate - University of Oklahoma
Cooperative Institute for Mesoscale Meteorological Studies
ARM Climate Research Facility - Data Quality Office
e-mail: [email protected]<mailto:[email protected]> | Office: 303-497-4754
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata