Re: [OPENMRS-DEV] sql cohort query

Bob Jolliffe Sun, 21 Aug 2011 12:17:38 -0700

Mike,

On 20 August 2011 03:04, Michael Seaton <[email protected]> wrote:
> I see no problem at all with a SqlIndicator, and I doubt that there is any
> reason why this wouldn't "just work" with the SDMX-HD module.  We always
> envisioned there would be all sorts of Indicator implementations (or
> Aggregated Data Element implementations for you Bob) - the CohortIndicator
> was just meant to be the beginning.
>
> As an alternative (or in addition), we could consider implementing something
> like a AggregatedDataSetColumnIndicator, which takes in:
>
> 1. A Mapped<DataSetDefinition>
> 2. A column name
> 3. An Aggregation
>
> Since we already have SqlDataSetDefinitions implemented, which simply return
> a table of Data that can contain absolutely anything, this indicator would
> wrap on of these, and then perform a particular aggregation on one of the
> columns in order to produce an Indicator value.  This would be pretty easy
> to implement...
>
> Bob - are either of these something you would be interested in taking on
> development-wise, or would you need one of us to take this on soon?


After spending some time looking through, I think this looks like too
much work for me :-)  Maybe not for those of you much more familiar
with the interfaces.   I suspect some refactoring might be required as
well which could be disruptive.

In particular, I see there is a CohortIndicatorDataSetDefinition, but
if we foresee different Indicator types then we probably need an
IndicatorDataSetDefinition instead, which could be composed of a
polymorphous collection of Indicators.  From the perspective of an
indicator report renderer it should not matter what underlies the
indicator (cohort, reportingobjectgroup, sqldatasetdefinition etc).
I see IndicatorResult interface (with getValue()) is already there.
Ryan's sdmx-hd integration module, for example, provides such a
renderer and so it looks like he necessarily makes use of the
CohortIndicatorDataSetDefinition because there isn't anything more
general for him to work with.  I suppose if there was a reasonable
IndicatorDataSetDefinition then it could have made use of that
instead.  So introducing different indicator types requires some work
in both modules.

So, being cautious, it looks doable but doesn't look that easy to me.
Though I do think your AggregatedDataSetColumnIndicator as you imagine
it above sounds about right.  Mind you I also start to wonder is a
CohortIndicator just really a special case of the above rather than a
sibling relation.  Also wondering how to reuse some of the gui stuff
around defining dimensions which is currently geared towards
cohortindicators.

>From the perspective of the immediate reporting requirements, we might
have to hack a workaround and leave this as a longer term prospect.

Regards
Bob

PS.  I found myself thinking round in circles when trying to grok
Datasets.  It seems that the dataset notion is used both for the
result of reporting as well as for the underlying elements.  Am I
right in figuring that a report contains one or more datasets, and
each element in turn of a cohortindicator dataset is composed of
aggregations (eg counts) of deeper datasets?

PPS.  These non-cohort indicators really are common.  Looking through
the Hospital performance indicator report they are using in India, the
majority of them would fall into this category.  Things like total OPD
cases, number of XRays given, number of minor procedures, number of
patients attended in Emergency etc.


>
> Mike
>
>
>
> On 08/19/2011 05:08 PM, Bob Jolliffe wrote:
>>
>> On 19 August 2011 19:18, Dave Thomas<[email protected]>  wrote:
>>>
>>> This is exactly what the reportingobjectgroup module that i wrote was
>>> for -- the idea that you might want to use SQL to select groups of any
>>> OpenmrsObject, and still be able to intersect it with a base cohort.
>>
>> Hi Dave.
>>
>> I haven't looked at the reportingobjectgroup module yet.  That was
>> going to be step two after we figured out the "easy" reported
>> dataelements which involved counting heads rather than counting other
>> openmrs objects.  And I'm not sure that I understand fully the nuances
>> of what you say above but I am worried that you still want to
>> intersect with a base cohort ... as long as we are talking of a cohort
>> then I'm guessing we don't count the same person twice.  So if I want
>> to know how many opd encounters there were last month, "intersecting"
>> this with a base cohort sounds like it will filter out the duplicates
>> which will again produce an incorrect result.  Or maybe I am wrong -
>> sorry to be speaking from ignorance having not yet looked at the
>> module.
>>
>>> I'd really like to talk strategy about how to roll this module into
>>> reporting core (or substitute a core solution for the things we
>>> already have built on it).
>>
>>  From our perspective (at least me and Viet :-) ) we are looking for a
>> sweet spot.  The implementors of the highly customized version of
>> openmrs running in Shimla, India, have already to a large extent
>> "solved" their reporting problems by creating Birt reports containing
>> the various odds and sods of aggregate dataelements they need to
>> produce.  The flexibility of using birt meant they could execute
>> whatever queries they want to populate the various reports.
>>
>> The downside being that the query, the data and the presentation all
>> become hopelessly entangled in the birt report.  And this doesn't
>> really help when you want to produce data to be consumed by another
>> system (eg dhis).  Its possible of course to extract the data from the
>> birt report but that is a bit of a hack, particularly when you need to
>> map the anonymous birt dataelements.  Having aggregated dataelement
>> (or indicator) objects defined within the system makes much more
>> sense.
>>
>> The strength of the reporting module, and why I have been its loudest
>> advocate, is that it separates the notion of reported dataelements (or
>> Indicators as they are known as in this context) from the rendering or
>> presentation of reports.  I am going to continue to use the term
>> aggregate dataelement rather than indicator, but otherwise the
>> semantics are not that important for the current discussion.  The
>> ability to define aggregate dataelements, datasets and composite
>> reports independently of how they are rendered is really a powerful
>> and even essential notion if we are to have a reporting capability
>> which meets a wide variety of use cases.  So the reporting module
>> really does move in the right direction ...
>>
>> But at the highest level of abstraction, an aggregate dataelement
>> object need only have a name, a description, and a mechanism (query)
>> for deriving a value.  It should not be a cast iron requirement that
>> there is an underlying cohort derived directly, or via an
>> intersection.  I can see for many cases this is very useful ... ie to
>> have an underlying cohort to drill down into.  But equally often it is
>> not and all you want is a count or some other aggregation.  So what we
>> find currently is that people argue that using the reporting module is
>> too inflexible and don't understand why we can produce certain reports
>> relatively simply in birt, but not in the reporting module.  And
>> naturally conclude that we should stick with Birt.
>>
>> In order to find the sweet spot of retaining the flexibility of birt
>> together with the organising structure of the reporting module, it
>> seems we need to have a class of aggregate dataelement whose only
>> constraint is that it must result in a number, but which can be
>> derived from any sql query.  I think this is what Darius is also
>> foreseeing in his SqlIndicator.  If we had such SqlIndicators, we
>> could (i) reuse the queries which have already been developed for birt
>> and (ii) render the resulting reports in various ways.  I think I even
>> suggested in a previous mail that an ideal outcome of this might be
>> that one of the possible renderings could in fact be a Birt report, in
>> which case we will have closed the circle and have available all the
>> presentation capabilities of Birt, but separated the definition of
>> aggregated dataelements from the presentation of them.
>>
>> Apologies if I have misinterpreted many things re cohorts,
>> intersections etc.  And if the reportingobject module meets the above
>> requirement then I am already delighted.  I'm going to look at it
>> tonight ...
>>
>> Regards
>> Bob
>>
>>
>>> d
>>>
>>> On Fri, Aug 19, 2011 at 5:30 PM, Darius Jazayeri
>>> <[email protected]>  wrote:
>>>>
>>>> We talked off-list, and it turns out that:
>>>>
>>>> Many/most of the indicators Bob wants to build are not really cohort
>>>> indicators, but rather counts of encounters, obs, log entries, etc.
>>>> They'd mostly be calculated via SQL.
>>>> They need to be able to export these via the sdmx-hd module, into DHIS.
>>>>
>>>> @Mike, @Ryan,
>>>> If we were to do a SqlIndicator implementation (which wouldn't be too
>>>> much
>>>> work), would that easily fit into the current SDMX-HD export module? Or
>>>> is
>>>> that hardcoded to cohort indicators? And how much work would it be to
>>>> change
>>>> that?
>>>> -Darius
>>>>
>>>> On Fri, Aug 19, 2011 at 7:33 AM, Bob Jolliffe<[email protected]>
>>>>  wrote:
>>>>>
>>>>> On 19 August 2011 15:07, Darius Jazayeri<[email protected]>
>>>>>  wrote:
>>>>>>
>>>>>> You're not doing a count distinct, so if your opd_patient_queue_log
>>>>>> can
>>>>>> have
>>>>>> the same patient_id more than once, that'd be why you get a
>>>>>> difference.
>>>>>> -Darius
>>>>>
>>>>> Thanks Darius.  You are absolutely right.  I also just figured that
>>>>> out a few minutes ago.
>>>>>
>>>>> Though it has left me with a sinking feeling about how to use the
>>>>> reporting module.  It makes sense now that the penny has slowly
>>>>> dropped, that a cohort query is in fact a query to select a distinct
>>>>> group, or cohort, of patients.  Which you could then drill down into
>>>>> etc.
>>>>>
>>>>> But at the level of a typical service indicator, I am really not
>>>>> interested in who the individual patients are.  I need to know how
>>>>> many patients had OPD encounters this month, for example.  Using a
>>>>> cohort query for this seemed to make sense, but of course it doesn't
>>>>> as it filters the duplicate patients.  So I should in fact be counting
>>>>> the encounters rather than the patients, but then its not a cohort
>>>>> query :-(
>>>>>
>>>>>> On Fri, Aug 19, 2011 at 5:37 AM, Bob Jolliffe<[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> I am trying to compose an indicator which makes use of a join with a
>>>>>>> custom table.
>>>>>>>
>>>>>>> Does anyone have an idea why executing the query directly as:
>>>>>>> mysql -u ... -e 'Select count(patient.patient_id) from patient inner
>>>>>>> join opd_patient_queue_log on
>>>>>>> patient.patient_id=opd_patient_queue_log.patient_id'
>>>>>>>
>>>>>>> results in 16593,
>>>>>>>
>>>>>>> but when I create a sql cohort query as above (without the count), I
>>>>>>> get a result of 13592.
>>>>>>>
>>>>>>> How does openmrs count the size of the resultset?  It seems its not a
>>>>>>> simple count ...
>>>>>>>
>>>>>>> Regards
>>>>>>> Bob
>>>>>>>
>>>>>>> _________________________________________
>>>>>>>
>>>>>>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail
>>>>>>> to
>>>>>>> [email protected] with "SIGNOFF openmrs-devel-l" in the
>>>>>>>  body
>>>>>>> (not
>>>>>>> the subject) of your e-mail.
>>>>>>>
>>>>>>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>>>>>
>>>>>> ________________________________
>>>>>> Click here to unsubscribe from OpenMRS Developers' mailing list
>>>>>
>>>>> _________________________________________
>>>>>
>>>>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to
>>>>> [email protected] with "SIGNOFF openmrs-devel-l" in the  body
>>>>> (not
>>>>> the subject) of your e-mail.
>>>>>
>>>>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>>>
>>>> ________________________________
>>>> Click here to unsubscribe from OpenMRS Developers' mailing list
>>>
>>> _________________________________________
>>>
>>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to
>>> [email protected] with "SIGNOFF openmrs-devel-l" in the  body (not
>>> the subject) of your e-mail.
>>>
>>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>>
>> _________________________________________
>>
>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to
>> [email protected] with "SIGNOFF openmrs-devel-l" in the  body (not
>> the subject) of your e-mail.
>>
>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
[email protected] with "SIGNOFF openmrs-devel-l" in the  body (not 
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

Re: [OPENMRS-DEV] sql cohort query

Reply via email to