Re: [OPENMRS-DEV] sql cohort query

Michael Seaton Tue, 23 Aug 2011 08:10:18 -0700

Dave,

Thanks for synthesizing these ideas so well. This sounds like just whatI had envisioned in previous discussions and would like to get onto the0.7 Road Map for Reporting. Do you envision this as providing anyreplacement functionality for the report object group stuff, or arethese too separate?


Cheers,
Mike


On 08/23/2011 10:14 AM, Dave Thomas wrote:

Hi.  I've been looking at the reporting code today, and here's what
i'm thinking about two new implementations of Indicator:

Here's what i'd really like:  a DataSetColumnIndicator

This would allow me to first define any DataSetDefinition (describing
any N-by-N table, essentially, with SqlDataSetDefinition being the
DataSetDefinition type that matches the use case driving this
discussion).  Then, DataSetColumnIndicator would require my
DataSetDefinition, a column name (or index), and an aggregator (COUNT,
SUM, AVG, MIN, MAX, MEDIAN, SINGLE_VALUE, etc... ) along with optional
Filters and Parameters.  This would be quite powerful, especially if
we're caching evaluated DataSetDefinitions at run-time so that you can
evaluate a bunch of indicators in memory after having run a single
query to load the DataSet (the N-by-N table).  And of course, our
DataSetColumnIndicator could be made to optionally handle a
denominator calculation, and dimensions (like CohortIndicator).

Then, for Bob's case, where the SQL is already written and already
returns fully calculated indicator values, SqlIndicator would be a
specific case of the above -- sort of a convenience Indicator
implementation.   I envision SqlIndicator to be an extension of
DataSetColumnIndicator that adds the method setSqlQuery(String
sqlQuery).  setSqlQuery would expect a query string that returns the
indicator value (i.e. a 1-by-1 result).  Behind the scenes,
SqlIndicator.setQuery(String sqlQuery) would really be adding a new
SqlDataSetDefinition(sqlQuery), an aggregator of SINGLE_VALUE, and a
first-position column index value.

Then, both of the above would use the same evaluator, which would
build the DataSet, and then calculate indicators against the columns.
The SqlIndicator case would return a 1 by 1 table, and return the
column value.  DataSetColumnIndicator could be used to do fancier
calculations against bigger tables.

My 2 cents.  I haven't looked at the sdmx-hd module, but it should be
too hard to change it to use a more generic Indicator type that can be
extended by Indicator types in the reporting module.

d



On Tue, Aug 23, 2011 at 12:20 PM, Bob Jolliffe<[email protected]>  wrote:

On 23 August 2011 00:49, Darius Jazayeri<[email protected]>  wrote:

Just to remind myself, you already have a bunch of SQL queries written (at
possibly great pain) which do all the calculations you want, right?

yes

Is there
some reason you might want to rewrite those in some mechanism other than
SQL?

No.  Though there are some tricky ones eg. related to bed occupancy
rate which might benefit from some java assistance ...  but lets not
worry about that yet.  I believe Chuyen has got most of what is
required in SQL

Do those queries already include the male/female, adult/child, etc,
breakdown?

There is surprizingly little disaggregation required.  For example, as
it stands, we collect total OPD admissions but not disaggregated by
gender or sex.  My sense is we probably we should as someone will
decide its a requirement further down the road.  Also similar
scenarios elsewhere will certainly have this type of disaggregation
(I'm thinking particularly of Kenys where there is a substantial
openmrs installed base which will also generate facility reports).

Do you want to somehow reuse the existing cohort-based
dimensions?
And you want to be able to run a report that contains both count-of-patient
indicators (or maybe not even these?) and these SQL-based indicators, then
export that via SDMX-HD.

Well there are two reasons we wanted to use the cohort indicators: (i)
they were the only indicators available and (ii) the SDMX-HD
integration module uses them.  It's now clear that what we really need
is not the cohort indicators but something with a very similar
interface.  Like the existing reportobjectgroup
  or proposed sqldataset indicators.  This would require modifying the
sdmx-hd module to map to this kind of indicator instead.  I think
ideally the sdmx-hd module would map against a polymorphic Indicator
rather than specific species of Indicator.  As I think Mike had
indicated in a previous mail, it should "just work".  Well it doesn't
quite currently because the Indicator interface is just a placeholder,
but it could.

Does the UI presentation of running this report
matter? (I assume you don't need the count-of-patient indicators to have
clickable counts that show you the number of patients.) How about the UI of
defining the indicators? Does there need to be one, or can the definitions
be created in-memory, as in the rwanda reports module?

Been thinking a bit about this.  Short term we could use either of the
above two existing approaches, though maintaining a programmer on site
isn't going to scale.  Neither defining indicators one by one through
the UI or defining them in code like the rwanda reports seems to me
ideal.  Generally the indicators you need to produce you are going to
get from elsewhere ie. we don't necessarily report what we want to
report.  For example exported from dhis2 or (worst case) an excel
spreadsheet or (best case) presented as a SDMX Data Structure
Definition message (a DSD defines lists of indicators, dataelements,
codelists for disaggregation etc).  I suspect ideally we want to
import the list of indicators and then just configure the underlying
sql queries through the UI.  So perhaps this would be a variant on the
rwanda reports approach - eg. maintain a table of dataelements with
sql defintions, almost like a table of stored procedures.  Then create
reports off this using code similar to that in rwanda reports.

As an aside, Viet and I were quite confused last week when we started
created cohort queries and indicators and then struggled to find them
in the database.  Then discovered serializedobject.  I suppose
populating a list of indicators would mean populating that table with
reporting objects.  This is probably not the time and place to bring
in persistence strategy, but these xstreamed objects feel a bit
convoluted to me.  But that's a subjective thing ... I can see how
recursively saving report elements is attractive.

Cheers
Bob

-Darius

On Mon, Aug 22, 2011 at 2:42 PM, Bob Jolliffe<[email protected]>  wrote:

On 22 August 2011 20:57, Darius Jazayeri<[email protected]>  wrote:

On this week's Design Review Call at 9am Eastern time (our last one of
these
calls before we switch to the new call schedule, actually) we will be
discussing how to solve HISP India's use case with the reporting module
in
the best/easiest way.
(Offhand I imagine that would be implementing a SqlIndicator, but I must
admit I haven't followed every aspect of this conversation.)

Darius thanks for the responsiveness on this.

Offhand it is looking to me that in the short term we might meet our
requirements through a similar module to the rwandareports and
objectgroupindicators.  Then find ourselves in the same corner as
Dave, looking at ways to refactor and harmonize on to firmer ground
:-)

I'm hoping to make the call.

Bob

-Darius

On Mon, Aug 22, 2011 at 9:13 AM, Bob Jolliffe<[email protected]>
wrote:

Thanks Dave (and Lara) for the explanations.  Some comments below ...

On 21 August 2011 21:56, Dave Thomas<[email protected]>  wrote:

Sure.  Here's an example (see attached;  sorry for the formatting --
i
recommend setting your texteditor to nowrap).

There's a bit of an explanation necessary here.  First, note that the
SQL returns exactly 2 items -- object_id (of any openmrs object), and
patient_id.  An ObjectGroup is really a Set<Integer[]>   where the
Integer[] is expected to contain 2 values -- object_id (any object),
and THEN the patient_id (the owner of the object).  A bit of a hack,
I
admit, but it allows for intersection with a base Cohort (as
discussed
previously), and the whole operation happens without instantiating
any
of the objects (Encounter or Patient, in this case), meaning that it
scales really well.  Again, this is all modeled against the core
notion of a Cohort, and then SqlCohortDefinition, CohortIndicator,
CohortIndicatorResult.

Also, looking at the attached code, i can't remember why the location
parameter was scoped differently than startDate and endDate, which
needed to be explicitly added to the definition for some reason.
Maybe this has changed?

Finally, reportingobjectgroup has its own persistence mechanisms for
sql definitions, indicator definitions, and report definitions.  I
had
to create my own services for these, because at the time, i remember
not being able to register persistence handlers with the regular
reporting xstream stuff from another module.  I also needed to create
my own service for the objectgroup evaluator to use.  These created a
ton of redundancy between reporting and reportingobjectgroup.  These
could be merged by having the reporting module allow external
registration of persistence handlers, and adding some very generic
executeSQL methods to the core reporting service that return values
according to generic interface types.

Reading through the above, you can see why i'd like to get this on
firmer footing.  Based on reading my own explanation, i may try the
purely SqlDataSetDefinition approach this week, time allowing, just
to
see how it goes.

That being said, we have this working just fine in production.  Our
primary care report (from which the attached code excerpt is from)
has about 30 total indicators, and runs in about 10 seconds against
an
Obs table with 5.6 million observations.

I have a better idea on what you have done here and I suspect it
contains most of the clues of how to get on a "firmer footing".  First
it seems to me the Indicator interface and BaseIndicator class both
need to be enrichened.  There's quite a bit of stuff in
CohortIndicator which can quite easily even now be moved up to
BaseIndicator to avoid some of the replication (as well as constrain)
what you have had to do.  Things like aggregateor, logic expression,
indicator type etc.

Getting into more meat, how about methods like setCohortDefinition(..)
and setObjectGroupDefinition(..)?  Do you foresee there could be a
more general setDefinition(..) method in the generic Indicator
interface?

Regards
Bob

d




On Sun, Aug 21, 2011 at 9:08 PM, Bob Jolliffe<[email protected]>
wrote:

Hi Dave

Is there some way we can take a look at the support module code to
get
a better idea how you use it?  Sorry, I'm sure its obvious to you
but
some of us are slow :-)

Regards
Bob

On 20 August 2011 12:00, Dave Thomas<[email protected]>  wrote:

Hi.  No, the indicator types are probably not exposed through the
UI.
We have another support module that registers our reports using
code...

d

On Sat, Aug 20, 2011 at 12:49 PM, Bob Jolliffe
<[email protected]>
wrote:

On 20 August 2011 09:40, Dave Thomas<[email protected]>  wrote:

Hi.  For reportingobjectgroup, say you write a query that pulls
up a
group of encounters.  If you intersect this with the Cohort 'all
patients in a particular program', it will filter out the
encounters
owned by patients not in the program.  Meaning that for any given
patient in the program, there can still be N encounters in the
result.
  This, i think, avoids your problem of having duplicate patients
in
your Cohort query condensing to 1.

OK.  That sounds good.  I just installed the module and I don't
quite
see how to find these new indicator types.  Are they not exposed
through the UI?

I originally got hung up on the
ability to intersect with a Cohort based on the default reporting
framework UI, which optionally allows you to select a base
Cohort,
but
in thinking about it, it allows us to ask interesting questions
pretty
easily.  Things like 'to what degree do patients in the HIV
program
also seek out primary care services outside of their monthly HIV
visit' -- this is extremely straightforward using this
intersection.

Its funny that the perception is that the reporting framework is
inflexible, because in truth I've found it to be the opposite.  I
wrote the reportingobjectgroup module over a week,

... I think that defines what I mean by inflexible from the user's
perspective :-)  As a programming framework it seems mighty fine.
  But
I guess, as Mike has alluded to, the intention was that a range of
different indicator types can be implemented.  I guess that is
where
we are now.

Cheers
Bob

because i was
facing the simple problem of counting primary care encounters at
a
health center over an arbitrary time period (usually a month).
  So
my
thought was, 'what if I replicate SqlCohortDefinition
functionality,
almost exactly, but allow the IDs to correspond to the ID of any
OpenmrsObject'.  This is the basic definition of an ObjectGroup.

I took this approach having cracked open the code of the
reporting
framework, found CohortSQLDefinition, and then decided to copy
it,
figuring that it wouldn't be very hard...  It of course turned
into
a
bit more of a project than i had expected, but we're using the
results
in production now, and we've been able to build some nice stuff
on
it.

That being said, I don't really want to maintain the module
forever,
and Mike has pointed out to me that there were other approaches
--
one
was building a SQLDataSetDefinition with a row-per-encounter
structure
and then building indicators off of this (not sure how much
coding
this would take?).  Another was writing a LogicDataSource and
creating
a Rule for 'given a patient, how many times did they come into
the
health center for a primary care visit over a give timeperiod?'
  (is
there a problem passing Reporting run-time parameters to logic
Rules?).

I don't have the bandwidth to try all three approaches,
unfortunately,
and I sort of wish that i had tried the SQLDataSetDefinition
first
(i'm not sure if it was already implemented at the time?).
Ultimately, the goal should be to be able to write 5 lines of
code
defining the actual SQL you want, for any Object, and then define
any
number of indicators.  I'm not sure which approach is going to
ultimately win, but i'm glad we're having this conversation, and
i
am
willing to put in some programming time once there are firm
specifications for how exactly a SQLIndicator works.
  Transferring
the
objectgroup indicators we have in production to a more
core-supported
strategy would make me feel like we're on slightly firmer ground.

d

On Sat, Aug 20, 2011 at 4:04 AM, Michael Seaton<[email protected]>
wrote:

I see no problem at all with a SqlIndicator, and I doubt that
there
is any
reason why this wouldn't "just work" with the SDMX-HD module.
  We
always
envisioned there would be all sorts of Indicator implementations
(or
Aggregated Data Element implementations for you Bob) - the
CohortIndicator
was just meant to be the beginning.

As an alternative (or in addition), we could consider
implementing
something
like a AggregatedDataSetColumnIndicator, which takes in:

1. A Mapped<DataSetDefinition>
2. A column name
3. An Aggregation

Since we already have SqlDataSetDefinitions implemented, which
simply return
a table of Data that can contain absolutely anything, this
indicator would
wrap on of these, and then perform a particular aggregation on
one
of the
columns in order to produce an Indicator value.  This would be
pretty easy
to implement...

Bob - are either of these something you would be interested in
taking on
development-wise, or would you need one of us to take this on
soon?

Mike



On 08/19/2011 05:08 PM, Bob Jolliffe wrote:

On 19 August 2011 19:18, Dave Thomas<[email protected]>   wrote:

This is exactly what the reportingobjectgroup module that i
wrote
was
for -- the idea that you might want to use SQL to select
groups
of any
OpenmrsObject, and still be able to intersect it with a base
cohort.

Hi Dave.

I haven't looked at the reportingobjectgroup module yet.  That
was
going to be step two after we figured out the "easy" reported
dataelements which involved counting heads rather than counting
other
openmrs objects.  And I'm not sure that I understand fully the
nuances
of what you say above but I am worried that you still want to
intersect with a base cohort ... as long as we are talking of a
cohort
then I'm guessing we don't count the same person twice.  So if
I
want
to know how many opd encounters there were last month,
"intersecting"
this with a base cohort sounds like it will filter out the
duplicates
which will again produce an incorrect result.  Or maybe I am
wrong
-
sorry to be speaking from ignorance having not yet looked at
the
module.

I'd really like to talk strategy about how to roll this module
into
reporting core (or substitute a core solution for the things
we
already have built on it).

   From our perspective (at least me and Viet :-) ) we are
looking
for a
sweet spot.  The implementors of the highly customized version
of
openmrs running in Shimla, India, have already to a large
extent
"solved" their reporting problems by creating Birt reports
containing
the various odds and sods of aggregate dataelements they need
to
produce.  The flexibility of using birt meant they could
execute
whatever queries they want to populate the various reports.

The downside being that the query, the data and the
presentation
all
become hopelessly entangled in the birt report.  And this
doesn't
really help when you want to produce data to be consumed by
another
system (eg dhis).  Its possible of course to extract the data
from
the
birt report but that is a bit of a hack, particularly when you
need to
map the anonymous birt dataelements.  Having aggregated
dataelement
(or indicator) objects defined within the system makes much
more
sense.

The strength of the reporting module, and why I have been its
loudest
advocate, is that it separates the notion of reported
dataelements
(or
Indicators as they are known as in this context) from the
rendering or
presentation of reports.  I am going to continue to use the
term
aggregate dataelement rather than indicator, but otherwise the
semantics are not that important for the current discussion.
  The
ability to define aggregate dataelements, datasets and
composite
reports independently of how they are rendered is really a
powerful
and even essential notion if we are to have a reporting
capability
which meets a wide variety of use cases.  So the reporting
module
really does move in the right direction ...

But at the highest level of abstraction, an aggregate
dataelement
object need only have a name, a description, and a mechanism
(query)
for deriving a value.  It should not be a cast iron requirement
that
there is an underlying cohort derived directly, or via an
intersection.  I can see for many cases this is very useful ...
ie
to
have an underlying cohort to drill down into.  But equally
often
it is
not and all you want is a count or some other aggregation.  So
what we
find currently is that people argue that using the reporting
module is
too inflexible and don't understand why we can produce certain
reports
relatively simply in birt, but not in the reporting module.
  And
naturally conclude that we should stick with Birt.

In order to find the sweet spot of retaining the flexibility of
birt
together with the organising structure of the reporting module,
it
seems we need to have a class of aggregate dataelement whose
only
constraint is that it must result in a number, but which can be
derived from any sql query.  I think this is what Darius is
also
foreseeing in his SqlIndicator.  If we had such SqlIndicators,
we
could (i) reuse the queries which have already been developed
for
birt
and (ii) render the resulting reports in various ways.  I think
I
even
suggested in a previous mail that an ideal outcome of this
might
be
that one of the possible renderings could in fact be a Birt
report, in
which case we will have closed the circle and have available
all
the
presentation capabilities of Birt, but separated the definition
of
aggregated dataelements from the presentation of them.

Apologies if I have misinterpreted many things re cohorts,
intersections etc.  And if the reportingobject module meets the
above
requirement then I am already delighted.  I'm going to look at
it
tonight ...

Regards
Bob

d

On Fri, Aug 19, 2011 at 5:30 PM, Darius Jazayeri
<[email protected]>   wrote:

We talked off-list, and it turns out that:

Many/most of the indicators Bob wants to build are not really
cohort
indicators, but rather counts of encounters, obs, log
entries,
etc.
They'd mostly be calculated via SQL.
They need to be able to export these via the sdmx-hd module,
into DHIS.

@Mike, @Ryan,
If we were to do a SqlIndicator implementation (which
wouldn't
be too
much
work), would that easily fit into the current SDMX-HD export
module? Or
is
that hardcoded to cohort indicators? And how much work would
it
be to
change
that?
-Darius

On Fri, Aug 19, 2011 at 7:33 AM, Bob
Jolliffe<[email protected]>
  wrote:

On 19 August 2011 15:07, Darius
Jazayeri<[email protected]>
  wrote:

You're not doing a count distinct, so if your
opd_patient_queue_log
can
have
the same patient_id more than once, that'd be why you get a
difference.
-Darius

Thanks Darius.  You are absolutely right.  I also just
figured
that
out a few minutes ago.

Though it has left me with a sinking feeling about how to
use
the
reporting module.  It makes sense now that the penny has
slowly
dropped, that a cohort query is in fact a query to select a
distinct
group, or cohort, of patients.  Which you could then drill
down
into
etc.

But at the level of a typical service indicator, I am really
not
interested in who the individual patients are.  I need to
know
how
many patients had OPD encounters this month, for example.
  Using a
cohort query for this seemed to make sense, but of course it
doesn't
as it filters the duplicate patients.  So I should in fact
be
counting
the encounters rather than the patients, but then its not a
cohort
query :-(

On Fri, Aug 19, 2011 at 5:37 AM, Bob
Jolliffe<[email protected]>
wrote:

I am trying to compose an indicator which makes use of a
join
with a
custom table.

Does anyone have an idea why executing the query directly
as:
mysql -u ... -e 'Select count(patient.patient_id) from
patient inner
join opd_patient_queue_log on
patient.patient_id=opd_patient_queue_log.patient_id'

results in 16593,

but when I create a sql cohort query as above (without the
count), I
get a result of 13592.

How does openmrs count the size of the resultset?  It
seems
its not a
simple count ...

Regards
Bob

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send
an
e-mail
to
[email protected] with "SIGNOFF openmrs-devel-l"
in
the
  body
(not
the subject) of your e-mail.



[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

________________________________
Click here to unsubscribe from OpenMRS Developers' mailing
list

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send
an
e-mail to
[email protected] with "SIGNOFF openmrs-devel-l"
in
the  body
(not
the subject) of your e-mail.



[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

________________________________
Click here to unsubscribe from OpenMRS Developers' mailing
list

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an
e-mail to
[email protected] with "SIGNOFF openmrs-devel-l" in
the
  body (not
the subject) of your e-mail.



[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an
e-mail to
[email protected] with "SIGNOFF openmrs-devel-l" in
the
  body (not
the subject) of your e-mail.



[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an
e-mail to
[email protected] with "SIGNOFF openmrs-devel-l" in
the
  body (not
the subject) of your e-mail.


[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an
e-mail
to [email protected] with "SIGNOFF openmrs-devel-l" in
the  body
(not the subject) of your e-mail.


[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an
e-mail
to [email protected] with "SIGNOFF openmrs-devel-l" in
the  body
(not the subject) of your e-mail.


[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an
e-mail
to [email protected] with "SIGNOFF openmrs-devel-l" in
the  body
(not the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail
to
[email protected] with "SIGNOFF openmrs-devel-l" in the
  body (not
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail
to
[email protected] with "SIGNOFF openmrs-devel-l" in the
  body (not
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to
[email protected] with "SIGNOFF openmrs-devel-l" in the  body
(not
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

________________________________
Click here to unsubscribe from OpenMRS Developers' mailing list

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to
[email protected] with "SIGNOFF openmrs-devel-l" in the  body (not
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

________________________________
Click here to unsubscribe from OpenMRS Developers' mailing list

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
[email protected] with "SIGNOFF openmrs-devel-l" in the  body (not 
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
[email protected] with "SIGNOFF openmrs-devel-l" in the  body (not 
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]


_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
[email protected] with "SIGNOFF openmrs-devel-l" in the  body (not 
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

Re: [OPENMRS-DEV] sql cohort query

Reply via email to