I've figured this out!

`org.dspace.app.xmlui.utils.DSpaceValidity`, which is used in 
`AbstractSearch` to cache results, actually looks up and keys all bundles, 
then all bitstreams, for each item the search results.

It seems reasonable to assume (at least for our use case) that only 
bitstreams in the ORIGINAL bundle are relevant to search results (i.e., a 
change in a public file is a reason to invalidate the cache, but a change 
in non-ORIGINAL files is not).

I've added a method to `DSpaceValidity` called 
`addIfItemOnlyAddOriginalBundles`, which only keys ORIGINAL bundles for an 
`Item`, and defers to the existing `add` for everything else. I then 
updated `AbstractSearch` to call my `addIfItemOnlyAddOriginalBundles` when 
it is adding the search result DSOs to the validity object.

This has dropped my SQL query total from over 9000 to about 60, and the 
page loads relatively fast.

Unfortunately, this won't help those who have lots of bitstreams in their 
ORIGINAL bundle, but that is perhaps unavoidable.

Jacob



On Wednesday, October 31, 2018 at 11:52:25 AM UTC-5, [email protected] wrote:
>
> Hi all,
>
> We are running DSpace 5.9 XMLUI with Tomcat 7 and Java 8 on a RHEL 7 
> server, with a small-ish collection of items (about 20,000). We are running 
> production with Oracle 12, but I have replicated the same issue with 
> Postgresql 9.2.
>
> We have recently noticed some very long page load times. Any given 
> discover/search page can take 2-7 seconds to load, and when there is even a 
> moderate amount of traffic (e.g., when a bot is indexing the site at about 
> 10 requests per second), page load times can take 30-60 seconds or longer.
>
> We have made the changes suggested at 
> https://wiki.duraspace.org/display/DSDOC5x/Performance+Tuning+DSpace for 
> both Tomcat and PostgreSQL.
>
> Our production site has been customized extensively, but I was able to 
> replicate the issue with an untouched DSpace 5.9 build using the default 
> Mirage theme with XMLUI.
>
> The issue is the same with both Oracle and PostgreSQL (PostgreSQL seems a 
> little bit better). 
>
> I have tried changing from Java 8 to Java 7.
>
> I have bumped up the database connection pool size to 300.
>
> Digging through the logs is difficult, since the problem only really 
> emerges under (moderate) load.
>
> However, I was able to track a single page request (to /discover), and 
> noticed that there were over 9000 individual SQL queries (for a single page 
> load) that looked like:
>
> DEBUG org.dspace.storage.rdbms.DatabaseManager @ Running query "SELECT * 
> FROM MetadataValue WHERE resource_id= ? and resource_type_id = ? ORDER BY 
> metadata_field_id, place"  with parameters: 144458,0
>
> (The resource_type_id `0` is for bitstreams.)
>
> I *think* (but could be wrong) that this is the source of our performance 
> problem; that the database is just getting bogged down with so many 
> requests. Looking at PostgreSQL's slow query logging, some of these 
> individual queries are taking about 1 second.
>
> Our situation is perhaps unique in that we have dozens (sometimes 
> hundreds) of "dark" (non-ORIGINAL) archival files associated with an item, 
> and it looks like this discover page is trying to load metadata for all of 
> them.
>
> This doesn't happen with an equivalent query in JSPUI.
>
> Any suggestions or workarounds? Why does the search page need to get 
> metadata for all bitstreams? 
>
> Does anyone know if upgrading to DSpace 6 would resolve this issue?
>
> Thanks,
>
> Jacob
>
>
>

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to