Re: [dspace-tech] Re: Notes on PostgreSQL connection pooling with a Tomcat JNDI resource

Alan Orth Tue, 27 Feb 2018 05:49:24 -0800

Hello,

*The JDBC pool parameter removeAbandoned='true' absolutely solves the idle
in transaction issue*.


However! In the past two weeks since I enabled it, I notice that we don't
have any "idle in transaction" connections in pg_stat_activity anymore—and
only a handful of active connections at any given time. The default timeout
is to prune "abandoned" connections after 60 seconds, which doesn't seem to
be a problem except on the XMLUI submissions page. One user with many
(thousands?) of submissions says this page is very slow and experiences
"Socket closed" errors eventually. This is presumably PostgreSQL killing
his connection. I've tried increasing this to 90 and even 180 seconds, but
it doesn't help. What is happening on the /submissions page?

Anyways, as an experiment, this morning I turned off the
removeAbandoned='true' setting on the JDBC pool and within a few hours
DSpace was down and there were 200 "idle in transaction" connections again.
*sigh* For what it's worth, pg_stat_activity shows 100 "idle in
transaction" connections running the following query:

SELECT * FROM resourcepolicy WHERE resource_type_id= $1 AND resource_id= $2
AND action_id= $3

I'm hoping for some insight on what this query is, and about the XMLUI
submissions page being slow! Again, we're running DSpace 5.5, PostgreSQL
9.5.11, and XMLUI with ~70,000 items.

Thanks,

On Sat, Feb 10, 2018 at 5:13 PM Alan Orth <alan.o...@gmail.com> wrote:

> Tim,
>
> Any idea why DSpace thinks those connections are busy when PostgreSQL
> itself reports them as idle?
>
> In order to try to understand the traffic that is coming in I wrote a
> Munin plugin[0] that uses JMX to monitor the number of active sessions in
> DSpace applications (REST doesn't appear to create sessions). From that
> it's clear that we have quite a lot of traffic sometimes, with nearly 1500
> active sessions sometimes (see attached)! We aggressively add bot user
> agents to Tomcat's Crawler Session Manager Valve so all of these bots like
> Google, Bing, and Baidu that browse from fifty IP addresses concurrently
> get lumped into one JSESSIONID.
>
> In any case I've adjusted my JDBC pool parameters to include the
> parameters removeAbandoned='true' and testOnReturn='true' in addition to
> those Mark and I were using earlier in this thread. Things are calm on the
> site the last few days but let's see what happens during the week. We're in
> the process of upgrading from 5.5 to 5.8, which will include some database
> fixes, but I'm definitely looking forward to moving our repository to
> DSpace 6.x to take advantage of the Hibernate work.
>
> [0]
> https://github.com/ilri/rmg-ansible-public/blob/master/roles/munin/files/jmx_dspace_sessions
>
> Cheers,
>
> On Thu, Feb 8, 2018 at 7:26 PM Tim Donohue <tdono...@duraspace.org> wrote:
>
>> Hi Alan,
>>
>> We do occasionally find areas of DSpace that fail to properly close
>> database connections or otherwise clean up after themselves.  So, it's not
>> unheard of for database leaks to be found and then fixed.  Obviously
>> though, if we were aware of a specific leak in XMLUI, we'd get it fixed.
>>
>> I do know that significant work at the database layer went into DSpace 6
>> (with the move to Hibernate)... and I've heard reports (from a few folks)
>> that these sorts of issues are less frequent in DSpace 6.x.  (As I have
>> heard of other large sites that said they had to restart DSpace every once
>> in a while prior to 6.x, but are not seeing the same issues with 6.x.  I've
>> not seen this behavior myself, but it's very possible that it is
>> encountered more with highly active and/or larger sites)
>>
>> That unfortunately doesn't solve the issues you are seeing though.  Is
>> your site extremely active (in terms of concurrent users)?  Are you perhaps
>> having a lot of bad behaving spiders that are hitting your site and
>> possibly using up database connections?  250 connections in the pool sounds
>> like plenty, but it obviously may be dependent on how many concurrent hits
>> you are seeing.
>>
>> In searching around, I also came across some older (circa 2010) notes on
>> our wiki at
>> https://wiki.duraspace.org/display/DSPACE/Idle+In+Transaction+Problem
>>  It doesn't provide any exact solutions though, but it shows that you are
>> not alone (I'm not sure I'd recommend killing "idle in transaction"
>> processes though, as that wiki page suggests).  Also here's more info from
>> Postgres on what "idle in transaction" means:
>> https://www.postgresql.org/docs/9.2/static/monitoring-ps.html (it also
>> suggests that looking at pg_locks system view might provide info on what
>> transactions are still open, etc).
>>
>> So, I don't have any answers here, but maybe these will provide you with
>> clues or inspiration to dig in a bit further.  Please do let us know what
>> you find, and feel free to post back with any clues or breadcrumbs you
>> discover along the way.
>>
>> - Tim
>>
>> On Wed, Feb 7, 2018 at 5:07 PM Alan Orth <alan.o...@gmail.com> wrote:
>>
>>> Is there some kind of database transaction leaking issue with DSpace,
>>> particularly the XMLUI? Basically, I find that it doesn't matter what I my
>>> pool size is. Eventually it always gets full. Sometimes it takes a few
>>> days, sometimes it happens a few times in one day. Today I saw this in
>>> dspace.log.2018-02-07:
>>>
>>> org.apache.tomcat.jdbc.pool.PoolExhaustedException:
>>> [http-bio-127.0.0.1-8443-exec-328] Timeout: Pool empty. Unable to fetch a
>>> connection in 5 seconds, none available[size:250; busy:250; idle:0;
>>> lastwait:5000].
>>>
>>> The pool size is 250, and yet PostgreSQL activity shows literally all of
>>> the 250 connections as either idle or idle in transaction:
>>>
>>> $ psql -c 'select * from pg_stat_activity' | grep -c "PostgreSQL JDBC"
>>> 250
>>> $ psql -c 'select * from pg_stat_activity' | grep "PostgreSQL JDBC" |
>>> grep -c idle
>>> 250
>>> $ psql -c 'select * from pg_stat_activity' | grep "PostgreSQL JDBC" |
>>> grep -c "idle in transaction"
>>> 187
>>>
>>> What is going on? I'm going crazy restarting the server and staring at
>>> log files trying to figure out what is going on. We're using DSpace 5.5
>>> with PostgreSQL 9.5.10 and the latest PostgreSQL JDBC driver from
>>> jdbc.postgresql.org (42.2.1) on fast hardware with plenty of memory and
>>> CPU. But DSpace. Always. Crashes. Every. Day. It seems DS-3551 might help,
>>> but we can't move to DSpace 5.7 yet, and our 5.5 branch doesn't build if I
>>> cherry-pick the commits for DS-3551[0].
>>>
>>> [0] https://jira.duraspace.org/browse/DS-3551
>>>
>>> Let's get to the bottom of this,
>>>
>>> On Thu, Feb 1, 2018 at 3:43 PM Mark H. Wood <mwoodiu...@gmail.com>
>>> wrote:
>>>
>>>> If you have stuck connections, you may want to try some of the pool
>>>> provider's connection testing parameters.  I've had good results using
>>>> 'validationQuery="SELECT 1"' with 'testOnBorrow="true"'.
>>>>
>>>> Also I've found that older PostgreSQL drivers seem to see this problem
>>>> more frequently, so I try to keep Tomcat updated with the latest released
>>>> DBMS drivers.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "DSpace Technical Support" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to dspace-tech+unsubscr...@googlegroups.com.
>>>> To post to this group, send email to dspace-tech@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/dspace-tech.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>>
>>> Alan Orth
>>> alan.o...@gmail.com
>>>
>>>
>>> https://picturingjordan.com
>>> https://englishbulgaria.net
>>> https://mjanja.ch
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "DSpace Technical Support" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to dspace-tech+unsubscr...@googlegroups.com.
>>> To post to this group, send email to dspace-tech@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/dspace-tech.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> Tim Donohue
>> Technical Lead for DSpace & DSpaceDirect
>> DuraSpace.org | DSpace.org | DSpaceDirect.org
>>
> --
>
> Alan Orth
> alan.o...@gmail.com
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
>
-- 

Alan Orth
alan.o...@gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Re: [dspace-tech] Re: Notes on PostgreSQL connection pooling with a Tomcat JNDI resource

Reply via email to