Ola Hodne Titlestad wrote:
Hi,
I've added a new blueprint here:
https://blueprints.launchpad.net/dhis2/+spec/improve-minmax-value-functionality

-which is about improving the min/max validation functionality. The current solution is very basic and not sufficient in many ways. Here are my thoughts on how to improve this. We can use this list for discussion and then update the blueprint when we settle on something concrete.

This is what I wrote in the blueprint:

A few improvements are needed to the min/max value functionality:

1) Generation of min/max values should be available from the data administration module
Currently you need to generate min/max ranges for each orgunit/dataset combination one by one in the data entry module. Sometimes you want to generate ranges for all orgunits and datasets at once and then data entry is not the place for this. In Data Administration we can add a new menu heading called "Min/MAx validation" and in there we can allow min/max generation for any combination of orgunit/dataset, and easily allow all combinations to be selected. Maybe also a good idea to include a "from" and "to" field to indicate which periods to use as the basis for the generation, e.g. from 2008-01-01 to 2008-12-31 would indicate that all 12 months of 2008 will be used if the dataset has monthly period type, or the 4 quarters of 2008 will be used if quarterly dataset etc.

Not sure which is the best way to do this, but one way could be to have "Data quality" as an item under maintenance, where you can set ranges, and also define and keep track of validation rules. Then, the "data quality" page currently under services could be split, so that you take the definition-side of it to maintenance, and the report-side of it as a subitem in the reports menu.

2) User defined parameters that control how the generation is done. Currently the range values are set to 10% lower than the lowest value and 10% higher than the highest value, which is a very crude method. This does not take care of outliers that might already be in the system.
any suggestions for a better statistical method for this? And on how to make it user defined?

Use some factor of standard deviation. That will take care of spread. +/- 10 % will not work for malaria, for instance, as it fluctuates naturally over the year, due to rainy season. I don't have here my copy of the infamous "Statistical concepts and methods" by Bhattacharyya and Johnson, arguably the most boring book in the world, but this would do for an explanation: http://en.wikipedia.org/wiki/Standard_deviation.

Then, as I think it is in DHIS 1.4, you can set the factor to calculate from, for instance 1.5, making the min and max the mean - 1.5 x st.dev and the mean + 1.5 x st.dev, respectively.

3) I assume we would like to keep the generate min/max option in data entry which can be useful for users that do not deal with all, but just a limited number of orgunits and know that a new round of generation would correct the min/max ranges. But thsi generation should then be configured in a setting, especially how many periods to use. So we could add another property in Data Administration->min/max validation that defines how many periods to use as basis for the generation, for monthly, weekly, yearly etc. period types. Do we need one property per period type? Currently this property is hard-coded to 6 in the source code.

4) Default min/max range per data element
Normally a min/max range is linked to an orgunit/dataelement combination, but sometimes, e.g when there is very little data or very poor data quality in the system it is useful to have a default range that can be used for all orgunits as a first level of validation to avoid typos and crazy outliers. These default values need to be set somewhere, and maybe data set management is the best suited place for this, at least that is where it is located in DHIS 1.4. Here we need some functionality to quickly set these ranges, even as quick as setting the same range for all data elements in a dataset, and then also the possibility to adjust individual data elements in the data (set) element list.
In Data entry the procedure will be to first check whether a min/max range exists for the orgunit/data element (the best option) and if not then load the default range for the data element (the next best option), and if nothing is set then leave it blank (the worst option).

I concur. In both Sierra Leone and Botswana, setting ranges for individual facilities, for all data elements, has just created a lot of extra work for the districts, which are not really aware of how the process works. So this has so far been skipped in Sierra Leone. As we want some kind of warning (colour coding and/or pop-ups), this can create a great deal of frustration until the ranges are correctly set, and also there are some wild typos where it looks like people have fallen asleep on the keyboard, which we want to avoid. It would then make sense to be able to set some global range default.

Johan

best regards,
Ola Hodne Titlestad
HISP
University of Oslo

_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : [email protected] Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp




_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~dhis2-devs
More help   : https://help.launchpad.net/ListHelp

Reply via email to