Re: [dbcp] Optimal defaults for DSpace
Hey Phil Thanks for responding. So, in your opinion, maxWaitMillis is optimally configured at 5 seconds as opposed to the default indefinite? Thanks for your input again and pointing to those resources. I think clearly I need to spend more time researching this. On Fri, Jan 1, 2021 at 11:22 PM Phil Steitz wrote: > > > On 12/20/20 10:26 PM, Hrafn Malmquist wrote: > > Hi Gary > > > > Thanks for taking the time to respond. > > > > I hope you can bear with me as I am still learning about database > > connection pooling. > > > > Perhaps I did not ask the question correctly. I am not asking about a > site > > specific setup but rather what defaults should be shipped with the > > software. I am part of the minor version release team. > > > > Currently, the default setup is a DBCP2 v. 2.1.1 connection pool with > > only maxWaitMillis, > > maxIdle and maxTotal configurable in the DSpace configuration settings > and > > the default values for these settings set to 5000, 10 and 30 > respectively. > > It's unclear why these defaults were chosen to begin with, git blame > shows > > they were chosen back in 2015. I don't think a lot of thought went into > > choosing 1) which parameters should be configurable nor 2) what their > > defaults should be (or why they should differ from DBCP2 defaults). > > > > DSpace repositories are run by higher education institutions and all > sorts > > of institutions and organisations involved in research, for instance the > > Smithsonian (https://repository.si.edu/). Therefore, although the vast > > majority of instances are run by small institutions that get little > > traffic, others are likely to receive relatively heavy traffic, from > users > > and crawlers. > > > > So the idea is to ask the experts what parameters should be configurable > > for the average repository admin, keeping in mind that the aim is for > > installation and setup to be simple (in effect, what are the "main" > > parameters likely to need tweaking) and what should the out-of-the-box > > defaults be (if at all different from the DBCP2 defaults). > > > > I am particularly surprised at the low maxWaitMillis chosen. Is that not > > likely to cause problems for high traffic sites? > > I would say no. Having threads blocked waiting for connections for > longer than 5 seconds will likely cause problems in heavily loaded > applications. You will end up running out of app server processing > threads if they are hanging for that long. If getConnection is taking > that long, there is likely a problem somewhere in the overall system - > processing threads holding connections too long, not enough connections, > database latency, etc. It all comes down to queuing theory. If your > app does not hold connections long and queries are optimized, even a > relatively small pool can handle decent load. The key is to not to > leave connections open or hold on to them too long. > > The defaults above look OK to me, though if database connections are not > in short supply, I would bump maxIdle to 20. The reason for this is > that setting it at 10 means that if the number used regularly goes up to > 20+, you will end up with a lot of connection churn. On the other hand, > if the usage pattern is spikes now and then followed by long periods of > lighter load, setting it at 20 will "waste" some connections. How > important that "waste" is depends on what else is going on in the DB, > how many pools are sharing it, etc. > > I would recommend upgrading to the latest version compatible with the > version of tc you are running, or simply using the version that ships > with tomcat (which is generally the latest compatible). Another reason > to upgrade dbcp if you are using it directly is to pick up the fixes in > the later version of commons pool that it brings in. > > For some general info on how dbcp and pool configs work, see [1]. It is > old, but the basic concepts are still correct. If you are familiar with > queuing theory, you can view a pool with n connections as a M/M/n > queue. What drives everything is request arrival rate and service time, > which in the case of dbcp is how long an application thread holds a > connection. You can observe actual utilization using the JMX interfaces. > > Phil > > [1] https://www.slideshare.net/psteitz/apachecon2014-pooldbcp > > > > Best regards, Hrafn > > > > > > [1] : > > > https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/spring/api/core-hibernate.xml#L41-L48 > > > > [2] : > > > https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/dspace.cfg#L77-L86 > > > > On Sun, Dec 20, 2020 at 6:40 PM Gary Gregory > wrote: > > > >> Hi, > >> > >> Each new DBCP release brings fixes, additions, and other updates, as > you > >> can read in the release notes. > >> > >> How to best configure DBCP for any given combination of JDBC driver, its > >> database, and application will be quite variable, which is
Re: [dbcp] Optimal defaults for DSpace
On 12/20/20 10:26 PM, Hrafn Malmquist wrote: Hi Gary Thanks for taking the time to respond. I hope you can bear with me as I am still learning about database connection pooling. Perhaps I did not ask the question correctly. I am not asking about a site specific setup but rather what defaults should be shipped with the software. I am part of the minor version release team. Currently, the default setup is a DBCP2 v. 2.1.1 connection pool with only maxWaitMillis, maxIdle and maxTotal configurable in the DSpace configuration settings and the default values for these settings set to 5000, 10 and 30 respectively. It's unclear why these defaults were chosen to begin with, git blame shows they were chosen back in 2015. I don't think a lot of thought went into choosing 1) which parameters should be configurable nor 2) what their defaults should be (or why they should differ from DBCP2 defaults). DSpace repositories are run by higher education institutions and all sorts of institutions and organisations involved in research, for instance the Smithsonian (https://repository.si.edu/). Therefore, although the vast majority of instances are run by small institutions that get little traffic, others are likely to receive relatively heavy traffic, from users and crawlers. So the idea is to ask the experts what parameters should be configurable for the average repository admin, keeping in mind that the aim is for installation and setup to be simple (in effect, what are the "main" parameters likely to need tweaking) and what should the out-of-the-box defaults be (if at all different from the DBCP2 defaults). I am particularly surprised at the low maxWaitMillis chosen. Is that not likely to cause problems for high traffic sites? I would say no. Having threads blocked waiting for connections for longer than 5 seconds will likely cause problems in heavily loaded applications. You will end up running out of app server processing threads if they are hanging for that long. If getConnection is taking that long, there is likely a problem somewhere in the overall system - processing threads holding connections too long, not enough connections, database latency, etc. It all comes down to queuing theory. If your app does not hold connections long and queries are optimized, even a relatively small pool can handle decent load. The key is to not to leave connections open or hold on to them too long. The defaults above look OK to me, though if database connections are not in short supply, I would bump maxIdle to 20. The reason for this is that setting it at 10 means that if the number used regularly goes up to 20+, you will end up with a lot of connection churn. On the other hand, if the usage pattern is spikes now and then followed by long periods of lighter load, setting it at 20 will "waste" some connections. How important that "waste" is depends on what else is going on in the DB, how many pools are sharing it, etc. I would recommend upgrading to the latest version compatible with the version of tc you are running, or simply using the version that ships with tomcat (which is generally the latest compatible). Another reason to upgrade dbcp if you are using it directly is to pick up the fixes in the later version of commons pool that it brings in. For some general info on how dbcp and pool configs work, see [1]. It is old, but the basic concepts are still correct. If you are familiar with queuing theory, you can view a pool with n connections as a M/M/n queue. What drives everything is request arrival rate and service time, which in the case of dbcp is how long an application thread holds a connection. You can observe actual utilization using the JMX interfaces. Phil [1] https://www.slideshare.net/psteitz/apachecon2014-pooldbcp Best regards, Hrafn [1] : https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/spring/api/core-hibernate.xml#L41-L48 [2] : https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/dspace.cfg#L77-L86 On Sun, Dec 20, 2020 at 6:40 PM Gary Gregory wrote: Hi, Each new DBCP release brings fixes, additions, and other updates, as you can read in the release notes. How to best configure DBCP for any given combination of JDBC driver, its database, and application will be quite variable, which is somewhat out of scope here IMO. Gary On Fri, Dec 18, 2020, 11:15 Hrafn Malmquist wrote: Good day I'm wondering what are optimal defaults for DSpace, open source digital repository software aimed especially at academic, non-profit, and commercial organizations (see https://duraspace.org/dspace/). DSpace supports both Postgres and Oracle and recommends Tomcat, Jetty or Caucho Resin. I suspect 9/10 installations use Tomcat. DSpace comes packaged with Apache Commons DCBP 2.1.1. DSpace only configures three configurations for DBCP2 using non-default settings. (see: [1] and [2])
Re: [dbcp] Optimal defaults for DSpace
>> Hi Gary >> >> I have and they don't know. Therefore, we are kind of looking at this >> afresh. >> >> For a web server like this, where there are usually lots of reads and not >> many writes. >> > >DBCP is agnostic to reading vs. writing, that all happens in SQL as I am >sure you know ;-) When I think about it it's obvious that it doesn't matter what happens during the connection session. The fact that I offer that piece of useless information only shows how much I am struggling to understand what should guide a decision for optimal defaults. > Does having defaults: >> maxWaitMillis = 5000, >> maxIdle = 10, >> maxTotal = 30 >> >> Make more sense than the DCP2 defaults? >> > >Only if you think so, I'm sorry I can't offer any guidelines for your >application. I appreciate that you are hesitant to offer generic advice. Nonetheless you are clearly an authority in this field being the main committer to the DBCP2 codebase. For Tomcat 8 it is explicitly recommended that maxWaitMillis not be set to lower than 10 seconds, preferably 10-15 seconds [1] Consider Deep Blue, the DSpace institutional repository for the University of Michigan [2] Taken at face value, it is likely that this web site gets high traffic as it is a relatively popular institution with a lot of content (130k > items). It is likely of course that the db administrator running it knows enough about connection pooling to calibrate the settings to something more sensible but as I am sure you understand it would be better if the defaults that come with DSpace are as close to optimal settings as possible. Correct me if I'm wrong, my understanding is that since maxWaitMillis causes exceptions to be raised on expiry, a codebase that uses a relatively short setting would need to be defensively coded to handle exceptions very well. Considering the fragmentary and decentralized way that DSpace has been developed (the classic open source way) I think it is fair to say that the codebase isn't very resilient. Therefore, not least in light of the abovementioned recommendations for Tomcat settings, the optimal generic setting for maxWaitMillis is at least 1. 1 - https://tomcat.apache.org/tomcat-8.0-doc/jndi-datasource-examples-howto.html#Intermittent_Database_Connection_Failures 2 - https://deepblue.lib.umich.edu/ On Thu, Dec 31, 2020 at 5:41 PM Gary Gregory wrote: > On Thu, Dec 31, 2020 at 11:55 AM Hrafn Malmquist < > hrafn.malmqu...@gmail.com> > wrote: > > > Hi Gary > > > > I have and they don't know. Therefore, we are kind of looking at this > > afresh. > > > > For a web server like this, where there are usually lots of reads and not > > many writes. > > > > DBCP is agnostic to reading vs. writing, that all happens in SQL as I am > sure you know ;-) > > > > Does having defaults: > > maxWaitMillis = 5000, > > maxIdle = 10, > > maxTotal = 30 > > > > Make more sense than the DCP2 defaults? > > > > Only if you think so, I'm sorry I can't offer any guidelines for your > application. > > > > > > maxWaitMillis = indefinitely, > > maxIdle = 8, > > maxTotal = 8 > > > > Perhaps having higher maxIdle and maxTotal can't hurt as these are > maximum > > bounds but the unusually (right?) low maxWaitMillis seems like it could > > easily cause problems, right? > > > > Maybe some else here has generic advice for you but I do not, as each > customer I've seen at work all have highly variable needs, configurations, > and operating environments, everything from Linux, Windows, to IBM i/Series > and z/Series. > > > > Also, these are the only properties wrapped into the configurable DSpace > > configuration. What other properties are those most commonly tweaked from > > DBCP2 defaults? > > > > Again, this is highly dependent on your use case. You'll have to experiment > within your operating envirnoment. > > Gary > > > > Happy new year > > Hrafn > > > > On Tue, Dec 29, 2020 at 2:31 PM Gary Gregory > > wrote: > > > > > Hi, > > > > > > I think you will have to ask the Dspace committers why they chose those > > > specific values. > > > > > > Gary > > > > > > On Mon, Dec 21, 2020, 00:27 Hrafn Malmquist > > > > wrote: > > > > > > > Hi Gary > > > > > > > > Thanks for taking the time to respond. > > > > > > > > I hope you can bear with me as I am still learning about database > > > > connection pooling. > > > > > > > > Perhaps I did not ask the question correctly. I am not asking about a > > > site > > > > specific setup but rather what defaults should be shipped with the > > > > software. I am part of the minor version release team. > > > > > > > > Currently, the default setup is a DBCP2 v. 2.1.1 connection pool with > > > > only maxWaitMillis, > > > > maxIdle and maxTotal configurable in the DSpace configuration > settings > > > and > > > > the default values for these settings set to 5000, 10 and 30 > > > respectively. > > > > It's unclear why these defaults were chosen to begin with, git blame > > > shows > > > > they were chosen back in 2015. I don't think a lot
Re: [dbcp] Optimal defaults for DSpace
On Thu, Dec 31, 2020 at 11:55 AM Hrafn Malmquist wrote: > Hi Gary > > I have and they don't know. Therefore, we are kind of looking at this > afresh. > > For a web server like this, where there are usually lots of reads and not > many writes. > DBCP is agnostic to reading vs. writing, that all happens in SQL as I am sure you know ;-) > Does having defaults: > maxWaitMillis = 5000, > maxIdle = 10, > maxTotal = 30 > > Make more sense than the DCP2 defaults? > Only if you think so, I'm sorry I can't offer any guidelines for your application. > > maxWaitMillis = indefinitely, > maxIdle = 8, > maxTotal = 8 > > Perhaps having higher maxIdle and maxTotal can't hurt as these are maximum > bounds but the unusually (right?) low maxWaitMillis seems like it could > easily cause problems, right? > Maybe some else here has generic advice for you but I do not, as each customer I've seen at work all have highly variable needs, configurations, and operating environments, everything from Linux, Windows, to IBM i/Series and z/Series. > Also, these are the only properties wrapped into the configurable DSpace > configuration. What other properties are those most commonly tweaked from > DBCP2 defaults? > Again, this is highly dependent on your use case. You'll have to experiment within your operating envirnoment. Gary > Happy new year > Hrafn > > On Tue, Dec 29, 2020 at 2:31 PM Gary Gregory > wrote: > > > Hi, > > > > I think you will have to ask the Dspace committers why they chose those > > specific values. > > > > Gary > > > > On Mon, Dec 21, 2020, 00:27 Hrafn Malmquist > > wrote: > > > > > Hi Gary > > > > > > Thanks for taking the time to respond. > > > > > > I hope you can bear with me as I am still learning about database > > > connection pooling. > > > > > > Perhaps I did not ask the question correctly. I am not asking about a > > site > > > specific setup but rather what defaults should be shipped with the > > > software. I am part of the minor version release team. > > > > > > Currently, the default setup is a DBCP2 v. 2.1.1 connection pool with > > > only maxWaitMillis, > > > maxIdle and maxTotal configurable in the DSpace configuration settings > > and > > > the default values for these settings set to 5000, 10 and 30 > > respectively. > > > It's unclear why these defaults were chosen to begin with, git blame > > shows > > > they were chosen back in 2015. I don't think a lot of thought went into > > > choosing 1) which parameters should be configurable nor 2) what their > > > defaults should be (or why they should differ from DBCP2 defaults). > > > > > > DSpace repositories are run by higher education institutions and all > > sorts > > > of institutions and organisations involved in research, for instance > the > > > Smithsonian (https://repository.si.edu/). Therefore, although the vast > > > majority of instances are run by small institutions that get little > > > traffic, others are likely to receive relatively heavy traffic, from > > users > > > and crawlers. > > > > > > So the idea is to ask the experts what parameters should be > configurable > > > for the average repository admin, keeping in mind that the aim is for > > > installation and setup to be simple (in effect, what are the "main" > > > parameters likely to need tweaking) and what should the out-of-the-box > > > defaults be (if at all different from the DBCP2 defaults). > > > > > > I am particularly surprised at the low maxWaitMillis chosen. Is that > not > > > likely to cause problems for high traffic sites? > > > > > > Best regards, Hrafn > > > > > > > > > [1] : > > > > > > > > > https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/spring/api/core-hibernate.xml#L41-L48 > > > > > > [2] : > > > > > > > > > https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/dspace.cfg#L77-L86 > > > > > > On Sun, Dec 20, 2020 at 6:40 PM Gary Gregory > > > wrote: > > > > > > > Hi, > > > > > > > > Each new DBCP release brings fixes, additions, and other updates, as > > you > > > > can read in the release notes. > > > > > > > > How to best configure DBCP for any given combination of JDBC driver, > > its > > > > database, and application will be quite variable, which is somewhat > out > > > of > > > > scope here IMO. > > > > > > > > Gary > > > > > > > > On Fri, Dec 18, 2020, 11:15 Hrafn Malmquist < > hrafn.malmqu...@gmail.com > > > > > > > wrote: > > > > > > > > > Good day > > > > > > > > > > I'm wondering what are optimal defaults for DSpace, open source > > digital > > > > > repository software aimed especially at academic, non-profit, and > > > > > commercial organizations (see https://duraspace.org/dspace/). > > > > > > > > > > DSpace supports both Postgres and Oracle and recommends Tomcat, > Jetty > > > or > > > > > Caucho Resin. I suspect 9/10 installations use Tomcat. > > > > > > > > > > DSpace comes packaged with Apache Commons DCBP 2.1.1. DSpace only > > > > >
Re: [dbcp] Optimal defaults for DSpace
Hi Gary I have and they don't know. Therefore, we are kind of looking at this afresh. For a web server like this, where there are usually lots of reads and not many writes. Does having defaults: maxWaitMillis = 5000, maxIdle = 10, maxTotal = 30 Make more sense than the DCP2 defaults? maxWaitMillis = indefinitely, maxIdle = 8, maxTotal = 8 Perhaps having higher maxIdle and maxTotal can't hurt as these are maximum bounds but the unusually (right?) low maxWaitMillis seems like it could easily cause problems, right? Also, these are the only properties wrapped into the configurable DSpace configuration. What other properties are those most commonly tweaked from DBCP2 defaults? Happy new year Hrafn On Tue, Dec 29, 2020 at 2:31 PM Gary Gregory wrote: > Hi, > > I think you will have to ask the Dspace committers why they chose those > specific values. > > Gary > > On Mon, Dec 21, 2020, 00:27 Hrafn Malmquist > wrote: > > > Hi Gary > > > > Thanks for taking the time to respond. > > > > I hope you can bear with me as I am still learning about database > > connection pooling. > > > > Perhaps I did not ask the question correctly. I am not asking about a > site > > specific setup but rather what defaults should be shipped with the > > software. I am part of the minor version release team. > > > > Currently, the default setup is a DBCP2 v. 2.1.1 connection pool with > > only maxWaitMillis, > > maxIdle and maxTotal configurable in the DSpace configuration settings > and > > the default values for these settings set to 5000, 10 and 30 > respectively. > > It's unclear why these defaults were chosen to begin with, git blame > shows > > they were chosen back in 2015. I don't think a lot of thought went into > > choosing 1) which parameters should be configurable nor 2) what their > > defaults should be (or why they should differ from DBCP2 defaults). > > > > DSpace repositories are run by higher education institutions and all > sorts > > of institutions and organisations involved in research, for instance the > > Smithsonian (https://repository.si.edu/). Therefore, although the vast > > majority of instances are run by small institutions that get little > > traffic, others are likely to receive relatively heavy traffic, from > users > > and crawlers. > > > > So the idea is to ask the experts what parameters should be configurable > > for the average repository admin, keeping in mind that the aim is for > > installation and setup to be simple (in effect, what are the "main" > > parameters likely to need tweaking) and what should the out-of-the-box > > defaults be (if at all different from the DBCP2 defaults). > > > > I am particularly surprised at the low maxWaitMillis chosen. Is that not > > likely to cause problems for high traffic sites? > > > > Best regards, Hrafn > > > > > > [1] : > > > > > https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/spring/api/core-hibernate.xml#L41-L48 > > > > [2] : > > > > > https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/dspace.cfg#L77-L86 > > > > On Sun, Dec 20, 2020 at 6:40 PM Gary Gregory > > wrote: > > > > > Hi, > > > > > > Each new DBCP release brings fixes, additions, and other updates, as > you > > > can read in the release notes. > > > > > > How to best configure DBCP for any given combination of JDBC driver, > its > > > database, and application will be quite variable, which is somewhat out > > of > > > scope here IMO. > > > > > > Gary > > > > > > On Fri, Dec 18, 2020, 11:15 Hrafn Malmquist > > > > wrote: > > > > > > > Good day > > > > > > > > I'm wondering what are optimal defaults for DSpace, open source > digital > > > > repository software aimed especially at academic, non-profit, and > > > > commercial organizations (see https://duraspace.org/dspace/). > > > > > > > > DSpace supports both Postgres and Oracle and recommends Tomcat, Jetty > > or > > > > Caucho Resin. I suspect 9/10 installations use Tomcat. > > > > > > > > DSpace comes packaged with Apache Commons DCBP 2.1.1. DSpace only > > > > configures three configurations for DBCP2 using non-default settings. > > > (see: > > > > [1] and [2]) > > > > > > > > These are > > > > maxTotal = 30 > > > > maxIdle = 10 > > > > maxWaitMillis = 5000 > > > > > > > > I am not sure what reasoning is behind the choice of these > > configuration > > > > settings. DSpace is used by all sorts of institutions, some receiving > > > very > > > > high traffic. My guess is that using the DBCP2 defaults is > recommended. > > > My > > > > question is, is this a good default configuration? Should there be > more > > > > configuration configurable by DSpace users in the DSpace config? > There > > > have > > > > been reports of the database not being reachable because of too many > > idle > > > > connections. According to one doc [3] maxWaitMillis should be at a > > > > minimum of 1 ms if I understand correctly. > > > > > > > > Also, I assume there
Re: [dbcp] Optimal defaults for DSpace
Hi, I think you will have to ask the Dspace committers why they chose those specific values. Gary On Mon, Dec 21, 2020, 00:27 Hrafn Malmquist wrote: > Hi Gary > > Thanks for taking the time to respond. > > I hope you can bear with me as I am still learning about database > connection pooling. > > Perhaps I did not ask the question correctly. I am not asking about a site > specific setup but rather what defaults should be shipped with the > software. I am part of the minor version release team. > > Currently, the default setup is a DBCP2 v. 2.1.1 connection pool with > only maxWaitMillis, > maxIdle and maxTotal configurable in the DSpace configuration settings and > the default values for these settings set to 5000, 10 and 30 respectively. > It's unclear why these defaults were chosen to begin with, git blame shows > they were chosen back in 2015. I don't think a lot of thought went into > choosing 1) which parameters should be configurable nor 2) what their > defaults should be (or why they should differ from DBCP2 defaults). > > DSpace repositories are run by higher education institutions and all sorts > of institutions and organisations involved in research, for instance the > Smithsonian (https://repository.si.edu/). Therefore, although the vast > majority of instances are run by small institutions that get little > traffic, others are likely to receive relatively heavy traffic, from users > and crawlers. > > So the idea is to ask the experts what parameters should be configurable > for the average repository admin, keeping in mind that the aim is for > installation and setup to be simple (in effect, what are the "main" > parameters likely to need tweaking) and what should the out-of-the-box > defaults be (if at all different from the DBCP2 defaults). > > I am particularly surprised at the low maxWaitMillis chosen. Is that not > likely to cause problems for high traffic sites? > > Best regards, Hrafn > > > [1] : > > https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/spring/api/core-hibernate.xml#L41-L48 > > [2] : > > https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/dspace.cfg#L77-L86 > > On Sun, Dec 20, 2020 at 6:40 PM Gary Gregory > wrote: > > > Hi, > > > > Each new DBCP release brings fixes, additions, and other updates, as you > > can read in the release notes. > > > > How to best configure DBCP for any given combination of JDBC driver, its > > database, and application will be quite variable, which is somewhat out > of > > scope here IMO. > > > > Gary > > > > On Fri, Dec 18, 2020, 11:15 Hrafn Malmquist > > wrote: > > > > > Good day > > > > > > I'm wondering what are optimal defaults for DSpace, open source digital > > > repository software aimed especially at academic, non-profit, and > > > commercial organizations (see https://duraspace.org/dspace/). > > > > > > DSpace supports both Postgres and Oracle and recommends Tomcat, Jetty > or > > > Caucho Resin. I suspect 9/10 installations use Tomcat. > > > > > > DSpace comes packaged with Apache Commons DCBP 2.1.1. DSpace only > > > configures three configurations for DBCP2 using non-default settings. > > (see: > > > [1] and [2]) > > > > > > These are > > > maxTotal = 30 > > > maxIdle = 10 > > > maxWaitMillis = 5000 > > > > > > I am not sure what reasoning is behind the choice of these > configuration > > > settings. DSpace is used by all sorts of institutions, some receiving > > very > > > high traffic. My guess is that using the DBCP2 defaults is recommended. > > My > > > question is, is this a good default configuration? Should there be more > > > configuration configurable by DSpace users in the DSpace config? There > > have > > > been reports of the database not being reachable because of too many > idle > > > connections. According to one doc [3] maxWaitMillis should be at a > > > minimum of 1 ms if I understand correctly. > > > > > > Also, I assume there are benefits to upgrading the DBCP2 dependency to > > the > > > most recent version, 2.8.0. I'm not sure what the major benefits are > > > though. I can see v. 2.5.0 only runs on Java 8. > > > > > > [1] - > > > > > > > > > https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/local.cfg.EXAMPLE#L88-L99 > > > [2] - > > > > > > > > > https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/spring/api/core-hibernate.xml#L46-L48 > > > [3] - > > > > > > > > > https://tomcat.apache.org/tomcat-8.0-doc/jndi-datasource-examples-howto.html#Intermittent_Database_Connection_Failures > > > > > >
Re: [dbcp] Optimal defaults for DSpace
Hi Gary Thanks for taking the time to respond. I hope you can bear with me as I am still learning about database connection pooling. Perhaps I did not ask the question correctly. I am not asking about a site specific setup but rather what defaults should be shipped with the software. I am part of the minor version release team. Currently, the default setup is a DBCP2 v. 2.1.1 connection pool with only maxWaitMillis, maxIdle and maxTotal configurable in the DSpace configuration settings and the default values for these settings set to 5000, 10 and 30 respectively. It's unclear why these defaults were chosen to begin with, git blame shows they were chosen back in 2015. I don't think a lot of thought went into choosing 1) which parameters should be configurable nor 2) what their defaults should be (or why they should differ from DBCP2 defaults). DSpace repositories are run by higher education institutions and all sorts of institutions and organisations involved in research, for instance the Smithsonian (https://repository.si.edu/). Therefore, although the vast majority of instances are run by small institutions that get little traffic, others are likely to receive relatively heavy traffic, from users and crawlers. So the idea is to ask the experts what parameters should be configurable for the average repository admin, keeping in mind that the aim is for installation and setup to be simple (in effect, what are the "main" parameters likely to need tweaking) and what should the out-of-the-box defaults be (if at all different from the DBCP2 defaults). I am particularly surprised at the low maxWaitMillis chosen. Is that not likely to cause problems for high traffic sites? Best regards, Hrafn [1] : https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/spring/api/core-hibernate.xml#L41-L48 [2] : https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/dspace.cfg#L77-L86 On Sun, Dec 20, 2020 at 6:40 PM Gary Gregory wrote: > Hi, > > Each new DBCP release brings fixes, additions, and other updates, as you > can read in the release notes. > > How to best configure DBCP for any given combination of JDBC driver, its > database, and application will be quite variable, which is somewhat out of > scope here IMO. > > Gary > > On Fri, Dec 18, 2020, 11:15 Hrafn Malmquist > wrote: > > > Good day > > > > I'm wondering what are optimal defaults for DSpace, open source digital > > repository software aimed especially at academic, non-profit, and > > commercial organizations (see https://duraspace.org/dspace/). > > > > DSpace supports both Postgres and Oracle and recommends Tomcat, Jetty or > > Caucho Resin. I suspect 9/10 installations use Tomcat. > > > > DSpace comes packaged with Apache Commons DCBP 2.1.1. DSpace only > > configures three configurations for DBCP2 using non-default settings. > (see: > > [1] and [2]) > > > > These are > > maxTotal = 30 > > maxIdle = 10 > > maxWaitMillis = 5000 > > > > I am not sure what reasoning is behind the choice of these configuration > > settings. DSpace is used by all sorts of institutions, some receiving > very > > high traffic. My guess is that using the DBCP2 defaults is recommended. > My > > question is, is this a good default configuration? Should there be more > > configuration configurable by DSpace users in the DSpace config? There > have > > been reports of the database not being reachable because of too many idle > > connections. According to one doc [3] maxWaitMillis should be at a > > minimum of 1 ms if I understand correctly. > > > > Also, I assume there are benefits to upgrading the DBCP2 dependency to > the > > most recent version, 2.8.0. I'm not sure what the major benefits are > > though. I can see v. 2.5.0 only runs on Java 8. > > > > [1] - > > > > > https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/local.cfg.EXAMPLE#L88-L99 > > [2] - > > > > > https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/spring/api/core-hibernate.xml#L46-L48 > > [3] - > > > > > https://tomcat.apache.org/tomcat-8.0-doc/jndi-datasource-examples-howto.html#Intermittent_Database_Connection_Failures > > >
Re: [dbcp] Optimal defaults for DSpace
Hi, Each new DBCP release brings fixes, additions, and other updates, as you can read in the release notes. How to best configure DBCP for any given combination of JDBC driver, its database, and application will be quite variable, which is somewhat out of scope here IMO. Gary On Fri, Dec 18, 2020, 11:15 Hrafn Malmquist wrote: > Good day > > I'm wondering what are optimal defaults for DSpace, open source digital > repository software aimed especially at academic, non-profit, and > commercial organizations (see https://duraspace.org/dspace/). > > DSpace supports both Postgres and Oracle and recommends Tomcat, Jetty or > Caucho Resin. I suspect 9/10 installations use Tomcat. > > DSpace comes packaged with Apache Commons DCBP 2.1.1. DSpace only > configures three configurations for DBCP2 using non-default settings. (see: > [1] and [2]) > > These are > maxTotal = 30 > maxIdle = 10 > maxWaitMillis = 5000 > > I am not sure what reasoning is behind the choice of these configuration > settings. DSpace is used by all sorts of institutions, some receiving very > high traffic. My guess is that using the DBCP2 defaults is recommended. My > question is, is this a good default configuration? Should there be more > configuration configurable by DSpace users in the DSpace config? There have > been reports of the database not being reachable because of too many idle > connections. According to one doc [3] maxWaitMillis should be at a > minimum of 1 ms if I understand correctly. > > Also, I assume there are benefits to upgrading the DBCP2 dependency to the > most recent version, 2.8.0. I'm not sure what the major benefits are > though. I can see v. 2.5.0 only runs on Java 8. > > [1] - > > https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/local.cfg.EXAMPLE#L88-L99 > [2] - > > https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/spring/api/core-hibernate.xml#L46-L48 > [3] - > > https://tomcat.apache.org/tomcat-8.0-doc/jndi-datasource-examples-howto.html#Intermittent_Database_Connection_Failures >
[dbcp] Optimal defaults for DSpace
Good day I'm wondering what are optimal defaults for DSpace, open source digital repository software aimed especially at academic, non-profit, and commercial organizations (see https://duraspace.org/dspace/). DSpace supports both Postgres and Oracle and recommends Tomcat, Jetty or Caucho Resin. I suspect 9/10 installations use Tomcat. DSpace comes packaged with Apache Commons DCBP 2.1.1. DSpace only configures three configurations for DBCP2 using non-default settings. (see: [1] and [2]) These are maxTotal = 30 maxIdle = 10 maxWaitMillis = 5000 I am not sure what reasoning is behind the choice of these configuration settings. DSpace is used by all sorts of institutions, some receiving very high traffic. My guess is that using the DBCP2 defaults is recommended. My question is, is this a good default configuration? Should there be more configuration configurable by DSpace users in the DSpace config? There have been reports of the database not being reachable because of too many idle connections. According to one doc [3] maxWaitMillis should be at a minimum of 1 ms if I understand correctly. Also, I assume there are benefits to upgrading the DBCP2 dependency to the most recent version, 2.8.0. I'm not sure what the major benefits are though. I can see v. 2.5.0 only runs on Java 8. [1] - https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/local.cfg.EXAMPLE#L88-L99 [2] - https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/spring/api/core-hibernate.xml#L46-L48 [3] - https://tomcat.apache.org/tomcat-8.0-doc/jndi-datasource-examples-howto.html#Intermittent_Database_Connection_Failures