On Sat, May 22, 2010 at 8:51 PM, Ola Hodne Titlestad <[email protected]> wrote: > On 20 May 2010 18:39, Bob Jolliffe <[email protected]> wrote: >> >> On 20 May 2010 15:56, Bob Jolliffe <[email protected]> wrote: >> > 2010/5/20 Ola Hodne Titlestad <[email protected]>: >> >> >> >> 2010/5/20 Lars Helge Øverland <[email protected]> >> >>> >> >>> Data elements derive their period type from the data sets they are >> >>> members >> >>> of. >> > >> > Restated (what I just sent Lars only by mistake): a datavalue derives >> > its period type from the data set of >> > which its data element is a member :-) >> > >> >> >> >> And when they are members of two datasets with different period types >> >> they >> >> have multiple period types right? >> > >> > It's important to remain aware that it is values ultimately which have >> > periods (and hence period types). >> > >> > And when you look at a value you can derive its period type in one of >> > two ways - via dataset or via period. Potentially these could >> > disagree, The one which derives from its period should be considered >> > authoritative ie. if the period is 2009-Jan then regardless of what >> > the dataset might say this really must be monthly. Of course we hope >> > these always agree. Incidentally the lookup from >> > datelement-to-dataset-to-period looks like a greater complexity than >> > the lookup from period->periodType. >> > >> >> >> >> The key thing to look out for in data entry and data import is to avoid >> >> overlaps in data values that will cause duplication when aggregating >> >> data >> >> periods. >> >> E.g. if the SAME ORGUNIT registers values for the same data element for >> >> two >> >> different period types that have overlapping periods, e.g. Jan-10 and >> >> Q1-10. >> >> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all >> >> show >> >> an incorrect value since the value for Jan-10 is counted twice. >> > >> > OK. Thats a good concrete constraint to have. >> > >> >> >> >> One way to enforce this constraint is to monitor which datasets an >> >> orgunit >> >> is assigned to, and not allow orgunits to be assigned to two datasets >> >> that >> >> have the same data element AND different period types. >> > >> > Agreed, Though this constraint should probably be imposed on forms >> > rather than datasets. >> > >> >>As far as I am aware, >> >> we are not checking for this today. During data import it could be >> >> checked >> >> on data element level by looking up the period type the way Bob has >> >> shown, >> >> but that sounds like a lot of look ups and time consuming validation, >> >> or? >> > >> > On data import we don't really validate at all, beyond whatever >> > constraints the db imposes. For efficiency we simply pop the values in >> > with multiple insert statement. So this validation would have to >> > happen as a stage before the actual import or would have to be >> > constrained within the db. In fact it can't be validated easily >> > before the import as it is dependent on existing values within the db. >> > >> >> >> >> A relatively normal use case that we probably have to find a way to >> >> support, >> >> and I think they are struggling with in Vietnam, is that different >> >> provinces >> >> can use different period types for the same data elements (even for >> >> complete >> >> data sets). E.g. if the national data flow policy says to report on >> >> immunisation data every quarter, so that becomes the minimum >> >> requirement for >> >> all provinces. Then some of the provinces decide that all their >> >> facilities >> >> have to collect this data monthly anyway, and then at the province >> >> level >> >> they simply send the quarterly aggregates to national level (in the >> >> paper-based or Excel world). At the same time other provinces just >> >> collect >> >> quarterly data at the facility level as in the minimum national >> >> requirement. >> >> At the national level there is a need to consolidate all this data, >> >> even >> >> data by the facility level, so ideally a national DHIS database should >> >> be >> >> able to store both monthly and quarterly raw data values for the same >> >> data >> >> elements, but for different orgunits. The national information users >> >> can >> >> then easily generate quarterly reports on immunisation for all >> >> provinces, >> >> while in some provinces they can do monthly data analysis if they want >> >> to >> >> collect data using that frequency. >> >> >> >> We support the above scenario by allowing the same data elements to be >> >> assigned to different data sets with different period types, but we >> >> don't >> >> control for misuse of this flexibility which can lead to duplication >> >> and >> >> inconsistent aggregated data values as pointed out above. >> > >> > Thinking further ... I really think the problem arises because we we >> > have a dataset concept which represents a form and is also used to >> > constrain periodtypes on dataelements. Thinking of the use case you >> > have just described, it should be the case that one can have a paper >> > form which national level expect to collect quarterly, and the same >> > form be used at a lower level to collect data monthly. If we wanted >> > to mirror that use case electronically we would have to divorce the >> > form from the periodtype - ie a form would collect datavalues of a >> > certain period, but the same form could be used in different orgunits >> > for collecting data at a different frequency.. >> > >> > So (leaving dataset aside for the moment) if we can't assign a >> > periodtype to a form and we can't assign to a dataelement and its too >> > inefficient to validate on a one by one datavalue basis what is a girl >> > to do? >> > >> > I suspect the correct answer is to refactor datavalue and create a >> > datavalueset type - note: a set of datavalues rather than a set of >> > dataelements. Designing out loud, a datavalueset would have the >> > following fields/attributes: >> > >> > 1. a formid - the collection instrument used - roughly corresponds to >> > current dataset >> > 2. an orgunitid - where the datavalues come from >> > 3. a periodid - the period of all the datavalues >> > couple of other useful attributes I can think of >> > >> > Datavalue now becomes slightly simpler (which is always a good thing). >> > It only has: >> > value, dataelementid, categorycombooption, datasetid >> >> Afterthought: >> At the risk of adding complexity to what is otherwise a >> simplification, my life could become even simpler if datavalueset also >> had a categorycombo attribute, which would imply that a dataset was >> linked to a formsectionid rather than a formid. >> >> So a form has sections. sections have dataelements. And sections >> have a datavalueset as a model - which implies a uniform categorycombo >> within the section. >> >> There isn't really a need for dataelements to have a categorycombo. >> And in lots of ways its good that they don't. Then I am reducing >> complexity rather than adding to it :-) >> >> Consider one orgunit has collected malaria deaths disaggregated by >> age. Another has collected values for the the same dataelement, but >> not disaggregated by age. The datavalues will come from a >> datavalueset so will have a categorycombo. It is possible to >> aggregate or compare these datavalues,from different datavaluesets, >> but using the lowest common denominator of categorycombo ie. in both >> cases you have access to malaria deaths - in the one case you have to >> "roll-up" the categorycombo which does of course assume that the sum >> of category options make a sensible whole, but Ola has mentioned this >> one many times. >> > > Some really interesting ideas you are bringing up here Bob. I like the kind > of flexibility and yet structure this would bring to the data model.
Agree that this is really interesting and important - and I don't want to complicate things further, but from the perspective of my department, there is also a need (mostly pronounced at higher levels like national, but not necessarily) to accommodate estimates and adjustments in values and indicators. This is linked to completeness - when you know data is missing, you still want to have a reasonable figure for reports. As an example: DHIS may not used in hospitals, where all cesarean deliveries are performed. Thus, a province or ministry relying only on data from DHIS will report 0 for this particular dataelement, which is obviously wrong. I guess adjusted figures are technically a bit like targets, in terms of how they relate to dataelements and datavalues? Or does this topic rather belong in its own thread/blueprint? Knut _______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : [email protected] Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp

