+1 to Calle's idea of staggering analytics year by year I also like Jason's suggestion of being able to configure the time period for which analytics is regenerated. If the general use-case has data being entered only for the current year, then is it perhaps unnecessary to regenerate data for previous years?
Cheers, -doh On Tue, Jul 26, 2016 at 2:36 PM, Calle Hedberg <[email protected]> wrote: > Hi, > > One (presumably) simple solution is to stagger analytics on a year by year > basis - i.e. run and complete 2009 before processing 2010. That would > reduce temp disk space requirements significantly while (presumably) not > changing the general design. > > Regards > Calle > > On 26 July 2016 at 10:24, Jason Pickering <[email protected]> > wrote: > >> Hi Devs, >> I am seeking some advice on how to try and decrease the amount of disk >> usage with DHIS2. >> >> Here is a list of the biggest tables in the system. >> >> public.datavalue | 2316 MB >> public.datavalue_pkey | 1230 MB >> public.in_datavalue_lastupdated | 680 MB >> >> >> There are a lot more tables, and all in all, the database occupies about >> 5.4 GB without analytics. >> >> This represents about 30 million data rows, so not that big of a database >> really. This server is being run off of a Digital Ocean virtual server with >> 60 GB of disk space. The only thing on the server really is Linux, >> Postgresql and Tomcat. Nothing else. With out analytics and everything >> installed for the system, we have about 23% of that 60 GB free. >> >> When analytics runs, it maintains a copy of the main analytics tables ( >> analytics_XXXX) and creates temp tables like analytics_temp_2004. When >> things are finished and the indexes are built, the tables are swapped. This >> ensures that analytics resources are available while analytics are being >> built, but the downside of this is that A LOT more disk space is required, >> as now we effectively have two copies of the tables along with all their >> indexes, which are quite large themselves (up to 60% the size of the table >> itself). Here's what happens when analytics is run >> >> public.analytics_temp_2015 | 1017 MB >> public.analytics_temp_2014 | 985 MB >> public.analytics_temp_2011 | 952 MB >> public.analytics_temp_2010 | 918 MB >> public.analytics_temp_2013 | 885 MB >> public.analytics_temp_2012 | 835 MB >> public.analytics_temp_2009 | 804 MB >> >> Now each analytics table is taking about 1 GB of space. In the end, it >> adds up to more than 60 GB and analytics fails to complete. >> >> So, while I understand the need for this functionality, I am wondering if >> we need a system option to allow the analytics tables to be dropped prior >> to regenerating them, or to have more control over the order in which they >> are generated (for instance to generate specific periods). I realize this >> can be done from the API or the scheduler, but only for the past three >> relative years. >> >> The reason I am asking for this is because its a bit of a pain (at the >> moment) when using Digital Ocean as a service provider, since their stock >> disk storage is 60 GB. With other VPS providers (Amazon, Linode), its a bit >> easier, but DigitalOcean only supports block storage in two regions at the >> moment. Regardless, it would seem somewhat wasteful to have to have such a >> large amount of disk space, for such a relatively small database. >> >> Is this something we just need to plan for and maybe provide better >> documentation on, or should we think about trying to offer better >> functionality for people running smaller servers? >> >> Regards, >> Jason >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~dhis2-devs >> Post to : [email protected] >> Unsubscribe : https://launchpad.net/~dhis2-devs >> More help : https://help.launchpad.net/ListHelp >> >> > > > -- > > ******************************************* > > Calle Hedberg > > 46D Alma Road, 7700 Rosebank, SOUTH AFRICA > > Tel/fax (home): +27-21-685-6472 > > Cell: +27-82-853-5352 > > Iridium SatPhone: +8816-315-19119 > > Email: [email protected] > > Skype: calle_hedberg > > ******************************************* > > > _______________________________________________ > Mailing list: https://launchpad.net/~dhis2-devs > Post to : [email protected] > Unsubscribe : https://launchpad.net/~dhis2-devs > More help : https://help.launchpad.net/ListHelp > >
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : [email protected] Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp

