Hi there, thanks for the feedback. Most of what's requested is available in the API. It's on our list to rewrite the import-export app and write a better scheduling manager for background tasks such analytics generation.
In the meantime: - Analytics tables generation <http://dhis2.github.io/dhis2-docs/master/en/developer/html/webapi_generating_resource_analytics_tables.html> for last x years - Data value export <http://dhis2.github.io/dhis2-docs/master/en/developer/html/webapi_data_values.html#d0e3600> (lastUpdated, lastUpdatedDuration, orgUnit params) regards, Lars On Sun, Sep 11, 2016 at 5:20 PM, David Siang Fong Oh <d...@thoughtworks.com> wrote: > I think Jason also pointed out that this could be achieved from the API, > but the question is whether it needs to be more user-friendly, i.e. > customisable using the web application as opposed to requiring a custom > script triggered by a cron job. > > Cheers, > > -doh > > On Sun, Sep 11, 2016 at 8:36 PM, Dan Cocos <dco...@gmail.com> wrote: > >> Hi All, >> >> You could run this >> /api/24/maintenance/analyticsTablesClear >> and this possibly this >> /api/24/maintenance/periodPruning >> >> I don't see it in the documentation but we use call this >> /api/resourceTables/analytics?lastYears=2 quite often for clients with >> a lot of historical data. >> >> Good luck, >> Dan >> >> *Dan Cocos* >> Principal, BAO Systems >> dco...@baosystems.com <nho...@baosystems.com> | http://www.baosystems.com >> | 2900 K Street, Suite 404, Washington D.C. 20007 >> >> >> >> >> >> On Sep 11, 2016, at 10:05 AM, Calle Hedberg <calle.hedb...@gmail.com> >> wrote: >> >> Hi, >> >> It's not only analytics that would benefit from segmented/staggered >> processing: I exported around 100 mill data values yesterday from a number >> of instance, and found that the export process was (seemingly) >> exponentially slower with increasing number of records exported. Most of >> the export files contained well under 10 mill records, which was pretty >> fast. In comparison, the largest export file with around 30 mill data >> values probably took 20 times as much time as an 8 mill value export. Based >> on just keeping an eye on the "progress bar", it seemed like some kind of >> cache staggering was taking place - the amount exported would increase >> quickly by 2-3mb, then "hang" for a good while, then increase quickly by >> 2-3mb again. >> >> Note also that there are several fundamental strategies one could use to >> reducing heavy work processes like analytics, exports (and thus imports), >> etc: >> - to be able to specify a sub-period as Jason's suggest >> - to be able to specify the "dirty" part of the instance by using e.g. >> LastUpdated >= xxxxx >> - to be able to specify a sub-OrgUnit-area >> >> These partial strategies are of course mostly relevant for very large >> instances, but such large instances are also the ones where you typically >> only have changes made to a small segment of the total - like if you have >> data for 30 years, 27 of those might be locked down and no longer available >> for updates. >> >> Regards >> Calle >> >> On 11 September 2016 at 15:47, David Siang Fong Oh <d...@thoughtworks.com> >> wrote: >> >>> +1 to Calle's idea of staggering analytics year by year >>> >>> I also like Jason's suggestion of being able to configure the time >>> period for which analytics is regenerated. If the general use-case has data >>> being entered only for the current year, then is it perhaps unnecessary to >>> regenerate data for previous years? >>> >>> Cheers, >>> >>> -doh >>> >>> On Tue, Jul 26, 2016 at 2:36 PM, Calle Hedberg <calle.hedb...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> One (presumably) simple solution is to stagger analytics on a year by >>>> year basis - i.e. run and complete 2009 before processing 2010. That would >>>> reduce temp disk space requirements significantly while (presumably) not >>>> changing the general design. >>>> >>>> Regards >>>> Calle >>>> >>>> On 26 July 2016 at 10:24, Jason Pickering <jason.p.picker...@gmail.com> >>>> wrote: >>>> >>>>> Hi Devs, >>>>> I am seeking some advice on how to try and decrease the amount of disk >>>>> usage with DHIS2. >>>>> >>>>> Here is a list of the biggest tables in the system. >>>>> >>>>> public.datavalue | 2316 MB >>>>> public.datavalue_pkey | 1230 MB >>>>> public.in_datavalue_lastupdated | 680 MB >>>>> >>>>> >>>>> There are a lot more tables, and all in all, the database occupies >>>>> about 5.4 GB without analytics. >>>>> >>>>> This represents about 30 million data rows, so not that big of a >>>>> database really. This server is being run off of a Digital Ocean virtual >>>>> server with 60 GB of disk space. The only thing on the server really is >>>>> Linux, Postgresql and Tomcat. Nothing else. With out analytics and >>>>> everything installed for the system, we have about 23% of that 60 GB free. >>>>> >>>>> When analytics runs, it maintains a copy of the main analytics tables >>>>> ( analytics_XXXX) and creates temp tables like analytics_temp_2004. When >>>>> things are finished and the indexes are built, the tables are swapped. >>>>> This >>>>> ensures that analytics resources are available while analytics are being >>>>> built, but the downside of this is that A LOT more disk space is required, >>>>> as now we effectively have two copies of the tables along with all their >>>>> indexes, which are quite large themselves (up to 60% the size of the table >>>>> itself). Here's what happens when analytics is run >>>>> >>>>> public.analytics_temp_2015 | 1017 MB >>>>> public.analytics_temp_2014 | 985 MB >>>>> public.analytics_temp_2011 | 952 MB >>>>> public.analytics_temp_2010 | 918 MB >>>>> public.analytics_temp_2013 | 885 MB >>>>> public.analytics_temp_2012 | 835 MB >>>>> public.analytics_temp_2009 | 804 MB >>>>> >>>>> Now each analytics table is taking about 1 GB of space. In the end, it >>>>> adds up to more than 60 GB and analytics fails to complete. >>>>> >>>>> So, while I understand the need for this functionality, I am wondering >>>>> if we need a system option to allow the analytics tables to be dropped >>>>> prior to regenerating them, or to have more control over the order in >>>>> which >>>>> they are generated (for instance to generate specific periods). I realize >>>>> this can be done from the API or the scheduler, but only for the past >>>>> three >>>>> relative years. >>>>> >>>>> The reason I am asking for this is because its a bit of a pain (at >>>>> the moment) when using Digital Ocean as a service provider, since their >>>>> stock disk storage is 60 GB. With other VPS providers (Amazon, Linode), >>>>> its >>>>> a bit easier, but DigitalOcean only supports block storage in two regions >>>>> at the moment. Regardless, it would seem somewhat wasteful to have to have >>>>> such a large amount of disk space, for such a relatively small database. >>>>> >>>>> Is this something we just need to plan for and maybe provide better >>>>> documentation on, or should we think about trying to offer better >>>>> functionality for people running smaller servers? >>>>> >>>>> Regards, >>>>> Jason >>>>> >>>>> _______________________________________________ >>>>> Mailing list: https://launchpad.net/~dhis2-devs >>>>> Post to : dhis2-devs@lists.launchpad.net >>>>> Unsubscribe : https://launchpad.net/~dhis2-devs >>>>> More help : https://help.launchpad.net/ListHelp >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> ******************************************* >>>> >>>> Calle Hedberg >>>> >>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA >>>> >>>> Tel/fax (home): +27-21-685-6472 >>>> >>>> Cell: +27-82-853-5352 >>>> >>>> Iridium SatPhone: +8816-315-19119 >>>> >>>> Email: calle.hedb...@gmail.com >>>> >>>> Skype: calle_hedberg >>>> >>>> ******************************************* >>>> >>>> >>>> _______________________________________________ >>>> Mailing list: https://launchpad.net/~dhis2-devs >>>> Post to : dhis2-devs@lists.launchpad.net >>>> Unsubscribe : https://launchpad.net/~dhis2-devs >>>> More help : https://help.launchpad.net/ListHelp >>>> >>>> >>> >> >> >> -- >> >> ******************************************* >> >> Calle Hedberg >> >> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA >> >> Tel/fax (home): +27-21-685-6472 >> >> Cell: +27-82-853-5352 >> >> Iridium SatPhone: +8816-315-19119 >> >> Email: calle.hedb...@gmail.com >> >> Skype: calle_hedberg >> >> ******************************************* >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~dhis2-devs >> Post to : dhis2-devs@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~dhis2-devs >> More help : https://help.launchpad.net/ListHelp >> >> >> > > _______________________________________________ > Mailing list: https://launchpad.net/~dhis2-devs > Post to : dhis2-devs@lists.launchpad.net > Unsubscribe : https://launchpad.net/~dhis2-devs > More help : https://help.launchpad.net/ListHelp > > -- Lars Helge Øverland Lead developer, DHIS 2 University of Oslo Skype: larshelgeoverland l...@dhis2.org http://www.dhis2.org <https://www.dhis2.org/>
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp