Re: [OPENMRS-DEV] sql cohort query

Bob Jolliffe Mon, 22 Aug 2011 09:20:36 -0700

Thanks Dave (and Lara) for the explanations.  Some comments below ...

On 21 August 2011 21:56, Dave Thomas <[email protected]> wrote:
> Sure.  Here's an example (see attached;  sorry for the formatting -- i
> recommend setting your texteditor to nowrap).
>
> There's a bit of an explanation necessary here.  First, note that the
> SQL returns exactly 2 items -- object_id (of any openmrs object), and
> patient_id.  An ObjectGroup is really a Set<Integer[]>  where the
> Integer[] is expected to contain 2 values -- object_id (any object),
> and THEN the patient_id (the owner of the object).  A bit of a hack, I
> admit, but it allows for intersection with a base Cohort (as discussed
> previously), and the whole operation happens without instantiating any
> of the objects (Encounter or Patient, in this case), meaning that it
> scales really well.  Again, this is all modeled against the core
> notion of a Cohort, and then SqlCohortDefinition, CohortIndicator,
> CohortIndicatorResult.
>
> Also, looking at the attached code, i can't remember why the location
> parameter was scoped differently than startDate and endDate, which
> needed to be explicitly added to the definition for some reason.
> Maybe this has changed?
>
> Finally, reportingobjectgroup has its own persistence mechanisms for
> sql definitions, indicator definitions, and report definitions.  I had
> to create my own services for these, because at the time, i remember
> not being able to register persistence handlers with the regular
> reporting xstream stuff from another module.  I also needed to create
> my own service for the objectgroup evaluator to use.  These created a
> ton of redundancy between reporting and reportingobjectgroup.  These
> could be merged by having the reporting module allow external
> registration of persistence handlers, and adding some very generic
> executeSQL methods to the core reporting service that return values
> according to generic interface types.
>
> Reading through the above, you can see why i'd like to get this on
> firmer footing.  Based on reading my own explanation, i may try the
> purely SqlDataSetDefinition approach this week, time allowing, just to
> see how it goes.
>
> That being said, we have this working just fine in production.  Our
> primary care report (from which the attached code excerpt is from)
> has about 30 total indicators, and runs in about 10 seconds against an
> Obs table with 5.6 million observations.


I have a better idea on what you have done here and I suspect it
contains most of the clues of how to get on a "firmer footing".  First
it seems to me the Indicator interface and BaseIndicator class both
need to be enrichened.  There's quite a bit of stuff in
CohortIndicator which can quite easily even now be moved up to
BaseIndicator to avoid some of the replication (as well as constrain)
what you have had to do.  Things like aggregateor, logic expression,
indicator type etc.

Getting into more meat, how about methods like setCohortDefinition(..)
and setObjectGroupDefinition(..)?  Do you foresee there could be a
more general setDefinition(..) method in the generic Indicator
interface?

Regards
Bob

>
> d
>
>
>
>
> On Sun, Aug 21, 2011 at 9:08 PM, Bob Jolliffe <[email protected]> wrote:
>> Hi Dave
>>
>> Is there some way we can take a look at the support module code to get
>> a better idea how you use it?  Sorry, I'm sure its obvious to you but
>> some of us are slow :-)
>>
>> Regards
>> Bob
>>
>> On 20 August 2011 12:00, Dave Thomas <[email protected]> wrote:
>>> Hi.  No, the indicator types are probably not exposed through the UI.
>>> We have another support module that registers our reports using
>>> code...
>>>
>>> d
>>>
>>> On Sat, Aug 20, 2011 at 12:49 PM, Bob Jolliffe <[email protected]> 
>>> wrote:
>>>> On 20 August 2011 09:40, Dave Thomas <[email protected]> wrote:
>>>>> Hi.  For reportingobjectgroup, say you write a query that pulls up a
>>>>> group of encounters.  If you intersect this with the Cohort 'all
>>>>> patients in a particular program', it will filter out the encounters
>>>>> owned by patients not in the program.  Meaning that for any given
>>>>> patient in the program, there can still be N encounters in the result.
>>>>>  This, i think, avoids your problem of having duplicate patients in
>>>>> your Cohort query condensing to 1.
>>>>
>>>> OK.  That sounds good.  I just installed the module and I don't quite
>>>> see how to find these new indicator types.  Are they not exposed
>>>> through the UI?
>>>>
>>>>>I originally got hung up on the
>>>>> ability to intersect with a Cohort based on the default reporting
>>>>> framework UI, which optionally allows you to select a base Cohort, but
>>>>> in thinking about it, it allows us to ask interesting questions pretty
>>>>> easily.  Things like 'to what degree do patients in the HIV program
>>>>> also seek out primary care services outside of their monthly HIV
>>>>> visit' -- this is extremely straightforward using this intersection.
>>>>>
>>>>> Its funny that the perception is that the reporting framework is
>>>>> inflexible, because in truth I've found it to be the opposite.  I
>>>>> wrote the reportingobjectgroup module over a week,
>>>>
>>>> ... I think that defines what I mean by inflexible from the user's
>>>> perspective :-)  As a programming framework it seems mighty fine.  But
>>>> I guess, as Mike has alluded to, the intention was that a range of
>>>> different indicator types can be implemented.  I guess that is where
>>>> we are now.
>>>>
>>>> Cheers
>>>> Bob
>>>>
>>>>> because i was
>>>>> facing the simple problem of counting primary care encounters at a
>>>>> health center over an arbitrary time period (usually a month).  So my
>>>>> thought was, 'what if I replicate SqlCohortDefinition functionality,
>>>>> almost exactly, but allow the IDs to correspond to the ID of any
>>>>> OpenmrsObject'.  This is the basic definition of an ObjectGroup.
>>>>>
>>>>> I took this approach having cracked open the code of the reporting
>>>>> framework, found CohortSQLDefinition, and then decided to copy it,
>>>>> figuring that it wouldn't be very hard...  It of course turned into a
>>>>> bit more of a project than i had expected, but we're using the results
>>>>> in production now, and we've been able to build some nice stuff on it.
>>>>>
>>>>> That being said, I don't really want to maintain the module forever,
>>>>> and Mike has pointed out to me that there were other approaches -- one
>>>>> was building a SQLDataSetDefinition with a row-per-encounter structure
>>>>> and then building indicators off of this (not sure how much coding
>>>>> this would take?).  Another was writing a LogicDataSource and creating
>>>>> a Rule for 'given a patient, how many times did they come into the
>>>>> health center for a primary care visit over a give timeperiod?'  (is
>>>>> there a problem passing Reporting run-time parameters to logic
>>>>> Rules?).
>>>>>
>>>>> I don't have the bandwidth to try all three approaches, unfortunately,
>>>>> and I sort of wish that i had tried the SQLDataSetDefinition first
>>>>> (i'm not sure if it was already implemented at the time?).
>>>>> Ultimately, the goal should be to be able to write 5 lines of code
>>>>> defining the actual SQL you want, for any Object, and then define any
>>>>> number of indicators.  I'm not sure which approach is going to
>>>>> ultimately win, but i'm glad we're having this conversation, and i am
>>>>> willing to put in some programming time once there are firm
>>>>> specifications for how exactly a SQLIndicator works.  Transferring the
>>>>> objectgroup indicators we have in production to a more core-supported
>>>>> strategy would make me feel like we're on slightly firmer ground.
>>>>>
>>>>> d
>>>>>
>>>>> On Sat, Aug 20, 2011 at 4:04 AM, Michael Seaton <[email protected]> wrote:
>>>>>> I see no problem at all with a SqlIndicator, and I doubt that there is 
>>>>>> any
>>>>>> reason why this wouldn't "just work" with the SDMX-HD module.  We always
>>>>>> envisioned there would be all sorts of Indicator implementations (or
>>>>>> Aggregated Data Element implementations for you Bob) - the 
>>>>>> CohortIndicator
>>>>>> was just meant to be the beginning.
>>>>>>
>>>>>> As an alternative (or in addition), we could consider implementing 
>>>>>> something
>>>>>> like a AggregatedDataSetColumnIndicator, which takes in:
>>>>>>
>>>>>> 1. A Mapped<DataSetDefinition>
>>>>>> 2. A column name
>>>>>> 3. An Aggregation
>>>>>>
>>>>>> Since we already have SqlDataSetDefinitions implemented, which simply 
>>>>>> return
>>>>>> a table of Data that can contain absolutely anything, this indicator 
>>>>>> would
>>>>>> wrap on of these, and then perform a particular aggregation on one of the
>>>>>> columns in order to produce an Indicator value.  This would be pretty 
>>>>>> easy
>>>>>> to implement...
>>>>>>
>>>>>> Bob - are either of these something you would be interested in taking on
>>>>>> development-wise, or would you need one of us to take this on soon?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 08/19/2011 05:08 PM, Bob Jolliffe wrote:
>>>>>>>
>>>>>>> On 19 August 2011 19:18, Dave Thomas<[email protected]>  wrote:
>>>>>>>>
>>>>>>>> This is exactly what the reportingobjectgroup module that i wrote was
>>>>>>>> for -- the idea that you might want to use SQL to select groups of any
>>>>>>>> OpenmrsObject, and still be able to intersect it with a base cohort.
>>>>>>>
>>>>>>> Hi Dave.
>>>>>>>
>>>>>>> I haven't looked at the reportingobjectgroup module yet.  That was
>>>>>>> going to be step two after we figured out the "easy" reported
>>>>>>> dataelements which involved counting heads rather than counting other
>>>>>>> openmrs objects.  And I'm not sure that I understand fully the nuances
>>>>>>> of what you say above but I am worried that you still want to
>>>>>>> intersect with a base cohort ... as long as we are talking of a cohort
>>>>>>> then I'm guessing we don't count the same person twice.  So if I want
>>>>>>> to know how many opd encounters there were last month, "intersecting"
>>>>>>> this with a base cohort sounds like it will filter out the duplicates
>>>>>>> which will again produce an incorrect result.  Or maybe I am wrong -
>>>>>>> sorry to be speaking from ignorance having not yet looked at the
>>>>>>> module.
>>>>>>>
>>>>>>>> I'd really like to talk strategy about how to roll this module into
>>>>>>>> reporting core (or substitute a core solution for the things we
>>>>>>>> already have built on it).
>>>>>>>
>>>>>>>  From our perspective (at least me and Viet :-) ) we are looking for a
>>>>>>> sweet spot.  The implementors of the highly customized version of
>>>>>>> openmrs running in Shimla, India, have already to a large extent
>>>>>>> "solved" their reporting problems by creating Birt reports containing
>>>>>>> the various odds and sods of aggregate dataelements they need to
>>>>>>> produce.  The flexibility of using birt meant they could execute
>>>>>>> whatever queries they want to populate the various reports.
>>>>>>>
>>>>>>> The downside being that the query, the data and the presentation all
>>>>>>> become hopelessly entangled in the birt report.  And this doesn't
>>>>>>> really help when you want to produce data to be consumed by another
>>>>>>> system (eg dhis).  Its possible of course to extract the data from the
>>>>>>> birt report but that is a bit of a hack, particularly when you need to
>>>>>>> map the anonymous birt dataelements.  Having aggregated dataelement
>>>>>>> (or indicator) objects defined within the system makes much more
>>>>>>> sense.
>>>>>>>
>>>>>>> The strength of the reporting module, and why I have been its loudest
>>>>>>> advocate, is that it separates the notion of reported dataelements (or
>>>>>>> Indicators as they are known as in this context) from the rendering or
>>>>>>> presentation of reports.  I am going to continue to use the term
>>>>>>> aggregate dataelement rather than indicator, but otherwise the
>>>>>>> semantics are not that important for the current discussion.  The
>>>>>>> ability to define aggregate dataelements, datasets and composite
>>>>>>> reports independently of how they are rendered is really a powerful
>>>>>>> and even essential notion if we are to have a reporting capability
>>>>>>> which meets a wide variety of use cases.  So the reporting module
>>>>>>> really does move in the right direction ...
>>>>>>>
>>>>>>> But at the highest level of abstraction, an aggregate dataelement
>>>>>>> object need only have a name, a description, and a mechanism (query)
>>>>>>> for deriving a value.  It should not be a cast iron requirement that
>>>>>>> there is an underlying cohort derived directly, or via an
>>>>>>> intersection.  I can see for many cases this is very useful ... ie to
>>>>>>> have an underlying cohort to drill down into.  But equally often it is
>>>>>>> not and all you want is a count or some other aggregation.  So what we
>>>>>>> find currently is that people argue that using the reporting module is
>>>>>>> too inflexible and don't understand why we can produce certain reports
>>>>>>> relatively simply in birt, but not in the reporting module.  And
>>>>>>> naturally conclude that we should stick with Birt.
>>>>>>>
>>>>>>> In order to find the sweet spot of retaining the flexibility of birt
>>>>>>> together with the organising structure of the reporting module, it
>>>>>>> seems we need to have a class of aggregate dataelement whose only
>>>>>>> constraint is that it must result in a number, but which can be
>>>>>>> derived from any sql query.  I think this is what Darius is also
>>>>>>> foreseeing in his SqlIndicator.  If we had such SqlIndicators, we
>>>>>>> could (i) reuse the queries which have already been developed for birt
>>>>>>> and (ii) render the resulting reports in various ways.  I think I even
>>>>>>> suggested in a previous mail that an ideal outcome of this might be
>>>>>>> that one of the possible renderings could in fact be a Birt report, in
>>>>>>> which case we will have closed the circle and have available all the
>>>>>>> presentation capabilities of Birt, but separated the definition of
>>>>>>> aggregated dataelements from the presentation of them.
>>>>>>>
>>>>>>> Apologies if I have misinterpreted many things re cohorts,
>>>>>>> intersections etc.  And if the reportingobject module meets the above
>>>>>>> requirement then I am already delighted.  I'm going to look at it
>>>>>>> tonight ...
>>>>>>>
>>>>>>> Regards
>>>>>>> Bob
>>>>>>>
>>>>>>>
>>>>>>>> d
>>>>>>>>
>>>>>>>> On Fri, Aug 19, 2011 at 5:30 PM, Darius Jazayeri
>>>>>>>> <[email protected]>  wrote:
>>>>>>>>>
>>>>>>>>> We talked off-list, and it turns out that:
>>>>>>>>>
>>>>>>>>> Many/most of the indicators Bob wants to build are not really cohort
>>>>>>>>> indicators, but rather counts of encounters, obs, log entries, etc.
>>>>>>>>> They'd mostly be calculated via SQL.
>>>>>>>>> They need to be able to export these via the sdmx-hd module, into 
>>>>>>>>> DHIS.
>>>>>>>>>
>>>>>>>>> @Mike, @Ryan,
>>>>>>>>> If we were to do a SqlIndicator implementation (which wouldn't be too
>>>>>>>>> much
>>>>>>>>> work), would that easily fit into the current SDMX-HD export module? 
>>>>>>>>> Or
>>>>>>>>> is
>>>>>>>>> that hardcoded to cohort indicators? And how much work would it be to
>>>>>>>>> change
>>>>>>>>> that?
>>>>>>>>> -Darius
>>>>>>>>>
>>>>>>>>> On Fri, Aug 19, 2011 at 7:33 AM, Bob Jolliffe<[email protected]>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> On 19 August 2011 15:07, Darius Jazayeri<[email protected]>
>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>> You're not doing a count distinct, so if your opd_patient_queue_log
>>>>>>>>>>> can
>>>>>>>>>>> have
>>>>>>>>>>> the same patient_id more than once, that'd be why you get a
>>>>>>>>>>> difference.
>>>>>>>>>>> -Darius
>>>>>>>>>>
>>>>>>>>>> Thanks Darius.  You are absolutely right.  I also just figured that
>>>>>>>>>> out a few minutes ago.
>>>>>>>>>>
>>>>>>>>>> Though it has left me with a sinking feeling about how to use the
>>>>>>>>>> reporting module.  It makes sense now that the penny has slowly
>>>>>>>>>> dropped, that a cohort query is in fact a query to select a distinct
>>>>>>>>>> group, or cohort, of patients.  Which you could then drill down into
>>>>>>>>>> etc.
>>>>>>>>>>
>>>>>>>>>> But at the level of a typical service indicator, I am really not
>>>>>>>>>> interested in who the individual patients are.  I need to know how
>>>>>>>>>> many patients had OPD encounters this month, for example.  Using a
>>>>>>>>>> cohort query for this seemed to make sense, but of course it doesn't
>>>>>>>>>> as it filters the duplicate patients.  So I should in fact be 
>>>>>>>>>> counting
>>>>>>>>>> the encounters rather than the patients, but then its not a cohort
>>>>>>>>>> query :-(
>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 19, 2011 at 5:37 AM, Bob Jolliffe<[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I am trying to compose an indicator which makes use of a join with 
>>>>>>>>>>>> a
>>>>>>>>>>>> custom table.
>>>>>>>>>>>>
>>>>>>>>>>>> Does anyone have an idea why executing the query directly as:
>>>>>>>>>>>> mysql -u ... -e 'Select count(patient.patient_id) from patient 
>>>>>>>>>>>> inner
>>>>>>>>>>>> join opd_patient_queue_log on
>>>>>>>>>>>> patient.patient_id=opd_patient_queue_log.patient_id'
>>>>>>>>>>>>
>>>>>>>>>>>> results in 16593,
>>>>>>>>>>>>
>>>>>>>>>>>> but when I create a sql cohort query as above (without the count), 
>>>>>>>>>>>> I
>>>>>>>>>>>> get a result of 13592.
>>>>>>>>>>>>
>>>>>>>>>>>> How does openmrs count the size of the resultset?  It seems its 
>>>>>>>>>>>> not a
>>>>>>>>>>>> simple count ...
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> Bob
>>>>>>>>>>>>
>>>>>>>>>>>> _________________________________________
>>>>>>>>>>>>
>>>>>>>>>>>> To unsubscribe from OpenMRS Developers' mailing list, send an 
>>>>>>>>>>>> e-mail
>>>>>>>>>>>> to
>>>>>>>>>>>> [email protected] with "SIGNOFF openmrs-devel-l" in the
>>>>>>>>>>>>  body
>>>>>>>>>>>> (not
>>>>>>>>>>>> the subject) of your e-mail.
>>>>>>>>>>>>
>>>>>>>>>>>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>>>>>>>>>>
>>>>>>>>>>> ________________________________
>>>>>>>>>>> Click here to unsubscribe from OpenMRS Developers' mailing list
>>>>>>>>>>
>>>>>>>>>> _________________________________________
>>>>>>>>>>
>>>>>>>>>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail 
>>>>>>>>>> to
>>>>>>>>>> [email protected] with "SIGNOFF openmrs-devel-l" in the  
>>>>>>>>>> body
>>>>>>>>>> (not
>>>>>>>>>> the subject) of your e-mail.
>>>>>>>>>>
>>>>>>>>>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>>>>>>>>
>>>>>>>>> ________________________________
>>>>>>>>> Click here to unsubscribe from OpenMRS Developers' mailing list
>>>>>>>>
>>>>>>>> _________________________________________
>>>>>>>>
>>>>>>>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to
>>>>>>>> [email protected] with "SIGNOFF openmrs-devel-l" in the  
>>>>>>>> body (not
>>>>>>>> the subject) of your e-mail.
>>>>>>>>
>>>>>>>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>>>>>>>
>>>>>>> _________________________________________
>>>>>>>
>>>>>>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to
>>>>>>> [email protected] with "SIGNOFF openmrs-devel-l" in the  body 
>>>>>>> (not
>>>>>>> the subject) of your e-mail.
>>>>>>>
>>>>>>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>>>>>
>>>>>> _________________________________________
>>>>>>
>>>>>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to
>>>>>> [email protected] with "SIGNOFF openmrs-devel-l" in the  body 
>>>>>> (not
>>>>>> the subject) of your e-mail.
>>>>>>
>>>>>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>>>>>
>>>>>
>>>>> _________________________________________
>>>>>
>>>>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
>>>>> [email protected] with "SIGNOFF openmrs-devel-l" in the  body 
>>>>> (not the subject) of your e-mail.
>>>>>
>>>>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>>>>
>>>>
>>>> _________________________________________
>>>>
>>>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
>>>> [email protected] with "SIGNOFF openmrs-devel-l" in the  body 
>>>> (not the subject) of your e-mail.
>>>>
>>>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>>>
>>>
>>> _________________________________________
>>>
>>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
>>> [email protected] with "SIGNOFF openmrs-devel-l" in the  body 
>>> (not the subject) of your e-mail.
>>>
>>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>>
>>
>> _________________________________________
>>
>> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
>> [email protected] with "SIGNOFF openmrs-devel-l" in the  body (not 
>> the subject) of your e-mail.
>>
>> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>>
>
> _________________________________________
>
> To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
> [email protected] with "SIGNOFF openmrs-devel-l" in the  body (not 
> the subject) of your e-mail.
>
> [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]
>

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
[email protected] with "SIGNOFF openmrs-devel-l" in the  body (not 
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

Re: [OPENMRS-DEV] sql cohort query

Reply via email to