Re: [Dhis2-users] R and the web API

Eric Green Tue, 24 Nov 2015 03:48:01 -0800

Hi Jason, 

This is VERY helpful. It’s clear you’ve thought a lot about these issues. 
Thanks for sharing these ideas, including the link to your validation example.

Our exchange has very quickly helped me to get a sense of how to get started
and where the trouble spots might be. Thanks for all of your help.

Eric

On November 24, 2015 at 3:45:13 AM, Jason Pickering
(jason.p.picker...@gmail.com) wrote:

Hi Eric,

Indicators in DHIS2 are constructed by metadata, so there is no standard way.
If you are going to aggregate these yourself, then yes, you would need to pull
out all of the component data elements, reconstruct the indicator in R, and
then perform the aggregation. You can see an example of an indicator here

https://play.dhis2.org/demo/api/indicators/ReUHfIn0pTQ

The numerators and denominators are described by the following snippet of
metadata...

<denominator>#{fbfJHSPpUQD.pq2XI5kz2BY}+#{fbfJHSPpUQD.PT59n8BQbqM}</denominator>
<numerator>#{fbfJHSPpUQD.pq2XI5kz2BY}+#{fbfJHSPpUQD.PT59n8BQbqM}-#{Jtf34kNZhzP.pq2XI5kz2BY}-#{Jtf34kNZhzP.PT59n8BQbqM}</numerator>

The first UID corresponds to the data element, and the second the UID of the
particular disaggregation (category option combination in DHIS2-ese).

There are other metadata components which are used to calculate the indicator,
such as the annualization factor, etc. Reconstructing the aggregation engine of
DHIS2 would probably not be totally trivial, but I describe some approaches
here which could probably be also applied to indicators. In the case shown
there, I show how you can take the metadata of DHIS2, and then using the
metadata of the system, perform validation rule evaluation outside of the
system in R. Since the syntax of the indicators and the syntax of the
validation rules are the same, it would seem feasible (if non-trivial) to do
this as well with indicators.

In terms of weighting, the important thing to keep in mind with DHIS2 with
indicators, is that the numerators and denominators are aggregated themselves,
and then divided. Thus, you end up effectively with a weighted average. The
other approach would be to calculate each numerator/denominator pair
separately, and then calculate the mean (unweighted).

In terms of comment of line 60, there is no guarantee that "indID <-
indicators$id[indicators$name==ind[i]]" will return anything, the way you have
the code at the moment. An NA could result, if there is no match there. But
yes, depending on your API call, NA/NULLs are possible, but the analytics
resources should not return any NULLs/NAs, but could return blank values. Best
to check, just to be sure.

Best regards,
Jason

On Tue, Nov 24, 2015 at 3:44 AM, Eric Green <epgr...@gmail.com> wrote:
Hi Alex and Jason,

Thanks for sharing these ideas. I was able to get the reference table I wanted.
Much appreciated.

Jason, your points about server stress are good. In my use case queries will be
small in scope and infrequent, but it’s a good point to remember.

I was not aware of the weighting issue (new to dhis2 and APIs!), but it makes
sense. I would need to switch to data elements, right? Could anyone point me to
good resources for finding out how specific indicators are constructed (and
weighted)? Is there a standard reference?

Jason, in your revised code (thanks!), could you clarify what you mean by
"#Needs to be checked against NAs and duplicates” in line 60? This step is just
creating the segment of the url that specifies the indicator, e.g.,
"dimension=dx:ReUHfIn0pTQ”. Are you saying more generally that resulting
datasets for indicators need to be checked for NAs and duplicates? I think I’m
missing something here.

Thanks again.

Eric

On November 23, 2015 at 10:36:26 AM, Jason Pickering
(jason.p.picker...@gmail.com) wrote:

Hi Eric,

Nice to see someone else looking to use R and DHIS2. :)

Another way of getting the orgunit Hierarchy is with something like this.

https://play.dhis2.org/demo/api/organisationUnits?fields=uid,parent[id],name,level,path

Once you have the parent ID you can then generate the entire tree structure .
The "path" also provides the full hierarchy of the position of a given orgunit
within the hierarchy. Once you have either of these, it would be possible to
generate the hierarchical structure pretty easily in R I think (although I have
not written the code to do it!).

I think your approach will work, but in general, the API can aggregate the data
for you (depending on how you would like to aggregate it). Otherwise, if you
make a lot of loops on the server, it could be a lot of data, and could
potentially put the server under stress (depending on the level of usage of
course). In general, I think it would make sense to try and only ask for what
you need, if that is possible, and supported by the API. This will run a lot
quicker (on the server and in R). This of course, all depends on the scale of
what you are asking for and if you need to perform some type of filtering (such
as outliers, bad data, etc) prior to aggregation, which the server may not
perform.

Also, be aware, that when getting indicators from DHIS2, you do not get the
data values which compose the indicators. Thus, any aggregation which you would
perform would likely be significantly different than DHIS2, because when DHIS2
aggregates the data, it does so with a weighted average, as opposed to an
un-weighted average (which would be the only possibility since you are getting
the percentages here rather than both the numerator and denominator).

I hacked your example a bit to make it a bit quicker. You can test the output
on RFiddle here.

Hope this helps to get you started.

Regards,
Jason

On Mon, Nov 23, 2015 at 3:46 PM, Alex Tumwesigye <atumwesi...@gmail.com> wrote:
Dear Eric,

Something like this should assist to generate the metadata
http://YOUR_URLl/api/organisationUnits.json?paging=false&fields=id,name,parent[id,name,parent[id,name,parent[id,name]]]&filter=level:EQ:5

The above will generate the orgunit hierachy at level 5 (lowest level) up to
level 2. Note how I use the parent[id,name]

Alex

On Mon, Nov 23, 2015 at 5:35 PM, Eric Green <epgr...@gmail.com> wrote:
I had a side conversation with Jason Pickering about using R to access the web
API, and I’m moving the conversation to the mailing list to document it for
others.

I asked Jason for guidance on modifying the API url to import data into R.
Prior to contacting Jason, I reviewed this documentation and his presentation
on R/DHIS2 integration (great stuff!). Jason was nice enough to create this
example that showed me how to use the pivot table app, copy the API url using
Firefox/Chrome developer tools, and use the pre-filled URL in R as a template.

I wanted to do more with organization units, so I modified Jason’s example
here: https://gist.github.com/ericpgreen/bb7fcb55efd8c93d3451.

I might not be approaching the problem the right way, but my general approach
is to define a set of periods (monthly) and organizational units and then loop
over a set of indicators to create a data frame for each indicator that has
values by unit (row) and period (column). Then in R (not shown), I will
transform each data frame from wide to long and then combine the data frames
for each indicator into a larger data frame for analysis.

I would like to have the data at the lowest level possible so I can later
aggregate at higher organization unit levels (e.g., counties) and periods
(e.g., years) as needed. I know I could just request these aggregations via the
API, but I am accustomed to working with datasets at the lowest level and doing
manipulations in my code so I can follow the process more closely (I’m new to
APIs).

My current question is how to obtain the metadata that indicates the
organizational hierarchy of units. When I define urlD in my code, I’d like to
automatically grab all facility OU’s where county==2, for instance. I know I
could do this if I had something like the following table. Right now I specify
each OU manually. Having this table would allow me to build the API url
programmatically.

Also, in the data frame that is created, I only know that an observation is
linked to facility 5, for instance, but I don’t have the metadata to show that
facility 5 is in sub county 3 which is in county 2 of country 1. So having this
table would let me aggregate on my end later.

Of course suggestions on improving my general approach are also welcome!!

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp

--
Alex Tumwesigye

Technical Advisor - DHIS2 (Consultant),
Ministry of Health/AFENET
Kampala
Uganda

IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya

IT Specialist (Servers, Networks and Security, Health Information Systems -
DHIS2 ) & Solar Consultant

+256 774149 775, + 256 759 800161

"I don't want to be anything other than what I have been - one tree hill "

--
Jason P. Pickering
email: jason.p.picker...@gmail.com
tel:+46764147049

_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to     : dhis2-users@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-users
More help   : https://help.launchpad.net/ListHelp

Re: [Dhis2-users] R and the web API

Reply via email to