Re: [OPENMRS-DEV] sql cohort query

Michael Seaton Sun, 21 Aug 2011 16:32:36 -0700

Thanks Bob,

Are you planning to be at the conference in Kigali in October? I willsee if we can work this into our plans for reporting work leading up tothis for the month of September.


Cheers,
Mike


On 08/21/2011 03:06 PM, Bob Jolliffe wrote:

Mike,

On 20 August 2011 03:04, Michael Seaton<[email protected]>  wrote:

I see no problem at all with a SqlIndicator, and I doubt that there is any
reason why this wouldn't "just work" with the SDMX-HD module.  We always
envisioned there would be all sorts of Indicator implementations (or
Aggregated Data Element implementations for you Bob) - the CohortIndicator
was just meant to be the beginning.

As an alternative (or in addition), we could consider implementing something
like a AggregatedDataSetColumnIndicator, which takes in:

1. A Mapped<DataSetDefinition>
2. A column name
3. An Aggregation

Since we already have SqlDataSetDefinitions implemented, which simply return
a table of Data that can contain absolutely anything, this indicator would
wrap on of these, and then perform a particular aggregation on one of the
columns in order to produce an Indicator value.  This would be pretty easy
to implement...

Bob - are either of these something you would be interested in taking on
development-wise, or would you need one of us to take this on soon?

After spending some time looking through, I think this looks like too
much work for me :-)  Maybe not for those of you much more familiar
with the interfaces.   I suspect some refactoring might be required as
well which could be disruptive.

In particular, I see there is a CohortIndicatorDataSetDefinition, but
if we foresee different Indicator types then we probably need an
IndicatorDataSetDefinition instead, which could be composed of a
polymorphous collection of Indicators.  From the perspective of an
indicator report renderer it should not matter what underlies the
indicator (cohort, reportingobjectgroup, sqldatasetdefinition etc).
I see IndicatorResult interface (with getValue()) is already there.
Ryan's sdmx-hd integration module, for example, provides such a
renderer and so it looks like he necessarily makes use of the
CohortIndicatorDataSetDefinition because there isn't anything more
general for him to work with.  I suppose if there was a reasonable
IndicatorDataSetDefinition then it could have made use of that
instead.  So introducing different indicator types requires some work
in both modules.

So, being cautious, it looks doable but doesn't look that easy to me.
Though I do think your AggregatedDataSetColumnIndicator as you imagine
it above sounds about right.  Mind you I also start to wonder is a
CohortIndicator just really a special case of the above rather than a
sibling relation.  Also wondering how to reuse some of the gui stuff
around defining dimensions which is currently geared towards
cohortindicators.

 From the perspective of the immediate reporting requirements, we might
have to hack a workaround and leave this as a longer term prospect.

Regards
Bob

PS.  I found myself thinking round in circles when trying to grok
Datasets.  It seems that the dataset notion is used both for the
result of reporting as well as for the underlying elements.  Am I
right in figuring that a report contains one or more datasets, and
each element in turn of a cohortindicator dataset is composed of
aggregations (eg counts) of deeper datasets?

PPS.  These non-cohort indicators really are common.  Looking through
the Hospital performance indicator report they are using in India, the
majority of them would fall into this category.  Things like total OPD
cases, number of XRays given, number of minor procedures, number of
patients attended in Emergency etc.

Mike



On 08/19/2011 05:08 PM, Bob Jolliffe wrote:

On 19 August 2011 19:18, Dave Thomas<[email protected]>   wrote:

This is exactly what the reportingobjectgroup module that i wrote was
for -- the idea that you might want to use SQL to select groups of any
OpenmrsObject, and still be able to intersect it with a base cohort.

Hi Dave.

I haven't looked at the reportingobjectgroup module yet.  That was
going to be step two after we figured out the "easy" reported
dataelements which involved counting heads rather than counting other
openmrs objects.  And I'm not sure that I understand fully the nuances
of what you say above but I am worried that you still want to
intersect with a base cohort ... as long as we are talking of a cohort
then I'm guessing we don't count the same person twice.  So if I want
to know how many opd encounters there were last month, "intersecting"
this with a base cohort sounds like it will filter out the duplicates
which will again produce an incorrect result.  Or maybe I am wrong -
sorry to be speaking from ignorance having not yet looked at the
module.

I'd really like to talk strategy about how to roll this module into
reporting core (or substitute a core solution for the things we
already have built on it).

   From our perspective (at least me and Viet :-) ) we are looking for a
sweet spot.  The implementors of the highly customized version of
openmrs running in Shimla, India, have already to a large extent
"solved" their reporting problems by creating Birt reports containing
the various odds and sods of aggregate dataelements they need to
produce.  The flexibility of using birt meant they could execute
whatever queries they want to populate the various reports.

The downside being that the query, the data and the presentation all
become hopelessly entangled in the birt report.  And this doesn't
really help when you want to produce data to be consumed by another
system (eg dhis).  Its possible of course to extract the data from the
birt report but that is a bit of a hack, particularly when you need to
map the anonymous birt dataelements.  Having aggregated dataelement
(or indicator) objects defined within the system makes much more
sense.

The strength of the reporting module, and why I have been its loudest
advocate, is that it separates the notion of reported dataelements (or
Indicators as they are known as in this context) from the rendering or
presentation of reports.  I am going to continue to use the term
aggregate dataelement rather than indicator, but otherwise the
semantics are not that important for the current discussion.  The
ability to define aggregate dataelements, datasets and composite
reports independently of how they are rendered is really a powerful
and even essential notion if we are to have a reporting capability
which meets a wide variety of use cases.  So the reporting module
really does move in the right direction ...

But at the highest level of abstraction, an aggregate dataelement
object need only have a name, a description, and a mechanism (query)
for deriving a value.  It should not be a cast iron requirement that
there is an underlying cohort derived directly, or via an
intersection.  I can see for many cases this is very useful ... ie to
have an underlying cohort to drill down into.  But equally often it is
not and all you want is a count or some other aggregation.  So what we
find currently is that people argue that using the reporting module is
too inflexible and don't understand why we can produce certain reports
relatively simply in birt, but not in the reporting module.  And
naturally conclude that we should stick with Birt.

In order to find the sweet spot of retaining the flexibility of birt
together with the organising structure of the reporting module, it
seems we need to have a class of aggregate dataelement whose only
constraint is that it must result in a number, but which can be
derived from any sql query.  I think this is what Darius is also
foreseeing in his SqlIndicator.  If we had such SqlIndicators, we
could (i) reuse the queries which have already been developed for birt
and (ii) render the resulting reports in various ways.  I think I even
suggested in a previous mail that an ideal outcome of this might be
that one of the possible renderings could in fact be a Birt report, in
which case we will have closed the circle and have available all the
presentation capabilities of Birt, but separated the definition of
aggregated dataelements from the presentation of them.

Apologies if I have misinterpreted many things re cohorts,
intersections etc.  And if the reportingobject module meets the above
requirement then I am already delighted.  I'm going to look at it
tonight ...

Regards
Bob

d

On Fri, Aug 19, 2011 at 5:30 PM, Darius Jazayeri
<[email protected]>   wrote:

We talked off-list, and it turns out that:

Many/most of the indicators Bob wants to build are not really cohort
indicators, but rather counts of encounters, obs, log entries, etc.
They'd mostly be calculated via SQL.
They need to be able to export these via the sdmx-hd module, into DHIS.

@Mike, @Ryan,
If we were to do a SqlIndicator implementation (which wouldn't be too
much
work), would that easily fit into the current SDMX-HD export module? Or
is
that hardcoded to cohort indicators? And how much work would it be to
change
that?
-Darius

On Fri, Aug 19, 2011 at 7:33 AM, Bob Jolliffe<[email protected]>
  wrote:

On 19 August 2011 15:07, Darius Jazayeri<[email protected]>
  wrote:

You're not doing a count distinct, so if your opd_patient_queue_log
can
have
the same patient_id more than once, that'd be why you get a
difference.
-Darius

Thanks Darius.  You are absolutely right.  I also just figured that
out a few minutes ago.

Though it has left me with a sinking feeling about how to use the
reporting module.  It makes sense now that the penny has slowly
dropped, that a cohort query is in fact a query to select a distinct
group, or cohort, of patients.  Which you could then drill down into
etc.

But at the level of a typical service indicator, I am really not
interested in who the individual patients are.  I need to know how
many patients had OPD encounters this month, for example.  Using a
cohort query for this seemed to make sense, but of course it doesn't
as it filters the duplicate patients.  So I should in fact be counting
the encounters rather than the patients, but then its not a cohort
query :-(

On Fri, Aug 19, 2011 at 5:37 AM, Bob Jolliffe<[email protected]>
wrote:

I am trying to compose an indicator which makes use of a join with a
custom table.

Does anyone have an idea why executing the query directly as:
mysql -u ... -e 'Select count(patient.patient_id) from patient inner
join opd_patient_queue_log on
patient.patient_id=opd_patient_queue_log.patient_id'

results in 16593,

but when I create a sql cohort query as above (without the count), I
get a result of 13592.

How does openmrs count the size of the resultset?  It seems its not a
simple count ...

Regards
Bob

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail
to
[email protected] with "SIGNOFF openmrs-devel-l" in the
  body
(not
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

________________________________
Click here to unsubscribe from OpenMRS Developers' mailing list

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to
[email protected] with "SIGNOFF openmrs-devel-l" in the  body
(not
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

________________________________
Click here to unsubscribe from OpenMRS Developers' mailing list

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to
[email protected] with "SIGNOFF openmrs-devel-l" in the  body (not
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to
[email protected] with "SIGNOFF openmrs-devel-l" in the  body (not
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]


_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
[email protected] with "SIGNOFF openmrs-devel-l" in the  body (not 
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

Re: [OPENMRS-DEV] sql cohort query

Reply via email to