Hi Calle I think Lars would probably know better, but given the kinds of cross tabulation that is happening with analytics, I doubt that it will scale linearly. Would be good to get some empirical data but I think you are probably going to have something more approaching n^2 time complexity.
On 24 October 2016 at 09:20, Calle Hedberg <calle.hedb...@gmail.com> wrote: > Neeraj, > > You never stated the number of records you have in the datavalue table - > what is it? > > In the same context: anybody have a rough idea of how many datavalue > records there are in the global DATIM database - which I think currently > might be the largest DHIS2 instance around? > > Given our own recent work on performance + what Neeraj has reported, I've > been thinking of creating one test instance with let us say 500 mill > datavalue records and another with let us say 1 billion, then use them to > identify key bottlenecks in various processes AND use them to ensure that > DHIS2 analytics performance is as linear as possible in terms of database > size. Postgresql has introduced a number of new indexing algorithms in > recent versions, and I'm not sure if DHIS2 is taking full advantage of them. > > Best regards > Calle > > On 24 October 2016 at 07:53, Brajesh Murari <brajesh.mur...@gmail.com> > wrote: > >> Congratulation Neeraj and team ...it much appreciated >> >> On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta <neeraj.h...@gmail.com> >> wrote: >> >>> Dear Team, Thanks for all your suggestions. >>> >>> Now the time of analytic is reduced to 10 hours 41 minutes. >>> >>> We tried to VACUUM as Sam suggested but it didn't help then we upgraded >>> postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made some >>> changes in configuration file of postgres and it reduced the time. But the >>> database size is still same. >>> >>> Thanks for all your help! >>> >>> Thanks, >>> Neeraj >>> >>> On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg <calle.hedb...@gmail.com> >>> wrote: >>> >>>> Neeraj, >>>> >>>> It's always an element of uncertainty linked to database sizes - ref >>>> Sam's post over. So indicating the number of records you have in the >>>> datavalue table & key meta-data tables would be useful + indicating whether >>>> you are running other instances on the same server. Some comments - I've >>>> been doing a lot of similar optimising work recently: >>>> >>>> 1. Upgrading to 9.5.4 is strongly recommended (and don't use 9.6 before >>>> the worst bugs are fixed and it has stabilised). >>>> >>>> 2. Carefully check your postgres.conf against the recommended settings. >>>> The guide is a bit superficial in the sense that it has recommended "fixed" >>>> values only and no explanations around ranges below or above those, but you >>>> can experiment a bit yourself (e.g. the recommended "max_connections = 200" >>>> might not be sufficient for a really large system like what you have. >>>> >>>> 3. If your server is running that single instance only, then 48GB or >>>> RAM should be sufficient. Our servers are all having 128GB RAM so we >>>> experimented quite a bit earlier this year with giving a DHIS2 instance >>>> large amounts or RAM (up to 60-70gb), with negligible impact on >>>> performance. According to Lars, the DHIS2 cannot really utilize more than >>>> around 16gb RAM (at least that is how I understood his communication at the >>>> time). So 48GB should be sufficient for a single instance. >>>> >>>> 4. I've been doing performance optimizing recently on an instance with >>>> - 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2 >>>> - 31,000 Orgunits >>>> - 420 data elements >>>> - 250 indicators >>>> - around 100 mill datavalue records >>>> - total size around 140gb with analytics tables. >>>> >>>> So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT >>>> DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the >>>> analytics engine is designed well, I would expect a nearly linear >>>> relationship between database size and the time analytics takes to run. So >>>> running analytics on your database on our server should in theory take 4-5 >>>> hours. >>>> >>>> We are obviously comparing oranges and nectarines here, in the sense >>>> that there might be other aspects of our server and database that is >>>> different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether >>>> your instance have lots of tracker data, etc etc). I have not seen any >>>> scientific/quantified comparative performance values related to specific >>>> parameters like number of CPUs and/or number of cores, but 12 cores SHOULD >>>> improve analytics performance quite a bit - assuming around 30% then it >>>> means running analytics on your database/server should take around 3 >>>> hours...... >>>> >>>> I tried getting comparative, quantitative data on various >>>> configurations of hardware and software (e.g. some users prefer CentOS, >>>> others Ubuntu) during the academy in August, but did not get much - it >>>> seems most users/providers have found a setup that works for them for now >>>> and nobody is doing any systematic performance testing (some of the >>>> international NGOs/companies using DHIS2 might have, but as with internally >>>> developed apps they are not that keen on sharing). So it would be highly >>>> appreciated if you would post the results on analytics time with every >>>> upgrade / tweak you do - starting with the upgrade to Pg 9.5.4 >>>> >>>> Best regards >>>> Calle >>>> >>>> On 19 October 2016 at 13:28, Sam Johnson <samuel.john...@qebo.co.uk> >>>> wrote: >>>> >>>>> Hi Neeraj, >>>>> >>>>> >>>>> >>>>> *Using VACUUM and ANALYZE* >>>>> >>>>> >>>>> >>>>> Like Brajesh, my background is MySQL, and one database admin task that >>>>> is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims unused >>>>> space (we’ve had 100Gb databases files drop to half their size) and >>>>> refreshes index statistics (if the shape of your data has changed over >>>>> time, this can make indices run faster). >>>>> >>>>> >>>>> >>>>> I’m new to PostgreSQL, but the core principles are the same, and a >>>>> quick bit of Googling shows that the equivalents in PostgreSQL are the >>>>> VACUUM and ANALYZE commands. If your database isn’t set to automatically >>>>> do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you >>>>> might want to try VACUUM *FULL*, which will literally rewrite all of >>>>> your database tables and indices into smaller, more efficient files (note, >>>>> however, that on a 500Gb database this could take a *looong* time – >>>>> perhaps test on a backup first?). The following forum post is a really >>>>> nice, plain-English explanation of what VACUUM does: >>>>> >>>>> http://dba.stackexchange.com/questions/126258/what-is-table- >>>>> bloating-in-databases >>>>> >>>>> >>>>> >>>>> As I mentioned, my background is MySQL rather than Postgres, so >>>>> someone with more specific Postgres experience might like to also chime in >>>>> here. >>>>> >>>>> >>>>> >>>>> Cheers, Sam. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *From: *Dhis2-users <dhis2-users-bounces+samuel.johnson= >>>>> qebo.co...@lists.launchpad.net> on behalf of Brajesh Murari < >>>>> brajesh.mur...@gmail.com> >>>>> *Date: *Wednesday, 19 October 2016 at 08:28 >>>>> *To: *Knut Staring <knu...@gmail.com> >>>>> *Cc: *DHIS 2 Users list <dhis2-us...@lists.launchpad.net>, DHIS2 >>>>> Developers <dhis2-devs@lists.launchpad.net> >>>>> *Subject: *Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing >>>>> Analytic >>>>> >>>>> >>>>> >>>>> Dear Neeraj, >>>>> >>>>> >>>>> >>>>> The physical database size doesn't matter much, even the number of >>>>> records don't matter. In my experience the biggest problem that one can >>>>> going to run in to is not size, but the number of queries you can handle >>>>> at >>>>> a time instance specially during analytic functionality execution. Most >>>>> probably you should going to have to move to a master/slave configuration >>>>> of your database, so that the read queries can run against the slaves and >>>>> the write queries run against the master. However, if you and your >>>>> database >>>>> management team are not ready for this than, you can tweak your indexes >>>>> for >>>>> the queries you are running to speed up the response times. Also there is >>>>> a >>>>> lot of tweaking you can do to the network stack and kernel in Linux where >>>>> MySQL Server has been installed that will help.Perhaps, I would focus >>>>> first >>>>> on your indexes, then have a server admin look at your OS, and if all that >>>>> doesn't help it might be time to implement a master/slave configuration. >>>>> The most important scalability factor is RAM. If the indexes of your >>>>> tables >>>>> fit into memory and your queries are highly optimized in analytic >>>>> functionality, you can serve a reasonable amount of requests with a >>>>> average >>>>> machine. The number of records do matter, depending of how your tables >>>>> look >>>>> like. It's a difference to have a lot of varchar fields or only a couple >>>>> of >>>>> ints or longs. The physical size of the database matters as well, think >>>>> of >>>>> backups, for instance. Depending on your engine, your physical db files on >>>>> grow, but don't shrink, for instance with innodb. So deleting a lot of >>>>> rows, doesn't help to shrink your physical files. Thus the database size >>>>> does matter. If you have more than one table with more than a million >>>>> records, then performance starts indeed to degrade. Indexig is one of the >>>>> important stand need to take care, If you hit one million records you will >>>>> get performance problems, if the indices are not set right (for example no >>>>> indices for fields in "WHERE statements" or "ON conditions" in joins). If >>>>> you hit 10 million records, you will start to get performance problems >>>>> even >>>>> if you have all your indices right. Hardware upgrades - adding more memory >>>>> and more processor power, especially memory - often help to reduce the >>>>> most >>>>> severe problems by increasing the performance again, at least to a certain >>>>> degree. >>>>> >>>>> >>>>> >>>>> On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring <knu...@gmail.com> >>>>> wrote: >>>>> >>>>> Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, >>>>> so perhaps you should try upgrading to 9.5 first. >>>>> >>>>> >>>>> >>>>> On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland <l...@dhis2.org> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> Hi Neeraj, >>>>> >>>>> >>>>> >>>>> what usually helps to improve runtime is to improve/increase: >>>>> >>>>> >>>>> >>>>> - ssd (read and write speed) >>>>> >>>>> - number of CPUs >>>>> >>>>> - using latest postgresql (9.6 claims to have even better indexing >>>>> performance >>>>> <https://www.postgresql.org/docs/9.6/static/release-9-6.html> than >>>>> 9.5) >>>>> >>>>> - tuning >>>>> <https://dhis2.github.io/dhis2-docs/master/en/implementer/html/install_server_setup.html#install_postgresql_performance_tuning> >>>>> of postgresql >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> regards, >>>>> >>>>> >>>>> >>>>> Lars >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Lars Helge Øverland >>>>> >>>>> Lead developer, DHIS 2 >>>>> >>>>> University of Oslo >>>>> >>>>> Skype: larshelgeoverland >>>>> >>>>> l...@dhis2.org >>>>> >>>>> http://www.dhis2.org <https://www.dhis2.org/> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Mailing list: https://launchpad.net/~dhis2-users >>>>> Post to : dhis2-us...@lists.launchpad.net >>>>> Unsubscribe : https://launchpad.net/~dhis2-users >>>>> More help : https://help.launchpad.net/ListHelp >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Knut Staring >>>>> >>>>> Dept. of Informatics, University of Oslo >>>>> >>>>> Norway: +4791880522 >>>>> >>>>> Skype: knutstar >>>>> >>>>> http://dhis2.org >>>>> >>>>> >>>>> _______________________________________________ >>>>> Mailing list: https://launchpad.net/~dhis2-devs >>>>> Post to : dhis2-devs@lists.launchpad.net >>>>> Unsubscribe : https://launchpad.net/~dhis2-devs >>>>> More help : https://help.launchpad.net/ListHelp >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Best Regards, >>>>> >>>>> >>>>> >>>>> Brajesh Murari, >>>>> >>>>> Postgraduate, Department of Computer Science and Engineering, >>>>> >>>>> Chaudhary Devi Lal University, Sirsa, >>>>> >>>>> India. >>>>> >>>>> >>>>> >>>>> The three basic dimensions of human development: a long and healthy >>>>> life, access to knowledge, and a decent standard of living. >>>>> >>>>> _______________________________________________ >>>>> Mailing list: https://launchpad.net/~dhis2-devs >>>>> Post to : dhis2-devs@lists.launchpad.net >>>>> Unsubscribe : https://launchpad.net/~dhis2-devs >>>>> More help : https://help.launchpad.net/ListHelp >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> ******************************************* >>>> >>>> Calle Hedberg >>>> >>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA >>>> >>>> Tel/fax (home): +27-21-685-6472 >>>> >>>> Cell: +27-82-853-5352 >>>> >>>> Iridium SatPhone: +8816-315-19119 >>>> >>>> Email: calle.hedb...@gmail.com >>>> >>>> Skype: calle_hedberg >>>> >>>> ******************************************* >>>> >>>> >>>> _______________________________________________ >>>> Mailing list: https://launchpad.net/~dhis2-users >>>> Post to : dhis2-us...@lists.launchpad.net >>>> Unsubscribe : https://launchpad.net/~dhis2-users >>>> More help : https://help.launchpad.net/ListHelp >>>> >>>> >>> >>> >>> -- >>> Thanks, >>> Neeraj Gupta >>> >>> _______________________________________________ >>> Mailing list: https://launchpad.net/~dhis2-users >>> Post to : dhis2-us...@lists.launchpad.net >>> Unsubscribe : https://launchpad.net/~dhis2-users >>> More help : https://help.launchpad.net/ListHelp >>> >>> >> >> >> -- >> Best Regards, >> >> Brajesh Murari, >> Postgraduate, Dept of CSE, >> Chaudhary Devi Lal University, Sirsa, >> India. >> >> The three basic dimensions of human development: a long and healthy life, >> access to knowledge, and a decent standard of living. >> > > > > -- > > ******************************************* > > Calle Hedberg > > 46D Alma Road, 7700 Rosebank, SOUTH AFRICA > > Tel/fax (home): +27-21-685-6472 > > Cell: +27-82-853-5352 > > Iridium SatPhone: +8816-315-19119 > > Email: calle.hedb...@gmail.com > > Skype: calle_hedberg > > ******************************************* > > > _______________________________________________ > Mailing list: https://launchpad.net/~dhis2-users > Post to : dhis2-us...@lists.launchpad.net > Unsubscribe : https://launchpad.net/~dhis2-users > More help : https://help.launchpad.net/ListHelp > >
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp