On Mon, Nov 6, 2023 at 2:38 PM DSpace Technical Support <
[email protected]> wrote:

> While awaiting an answer to your question, may I propose a tangential
> question of my own:  what do you think sharding is doing for you, and have
> you seen evidence to support this?
>
>
>
>
Mark, I don't know if it's doing anything for us now.

I can't remember, but honestly, I don't think that we took any measurements
of "before sharding" or "after sharding" performance except possibly
anecdotal "seems better now, the system is slow less often" notes. If I
recall correctly (and this is fuzzy), part of our motivation in finally
doing the sharding was to help with logistics in resolving another issue
that we were having. When DSpace added the uuid fields to supplement or
replace the id fields, some of our statistics reports (perhaps customs one
we had here)  were inaccurate or broken. Terry Brady supplied a very nice
tool to update old statistics records with IDs to use the new uuid instead,
but we had difficulties using this tool on our very large statistics core,
with an operating production system, with the particular storage hardware
that we were using at the time. With our setup then, it ran slowly and
slowed the system so that the web interface was unusable. The initial
sharding helped us to use Terry's tool to update the smaller and static
cores for each previous year offline, completely away from the production
system, and reinstall them when we were finished.

Beyond that logistical motivation to help resolve the missing uuid issue, I
suspect that as a group we just read that DSpace supports sharding and that
sharding helps performance problems, noticed that we sometimes have
performance problems, and thought that sharding was a best practice and we
should implement it.

Do we currently get any benefits from the sharding on our system? I don't
know. We have at least one drawback under DSpace 6 with sharding. If our
Tomcat shuts down and restarts too fast (by monit), sometimes it tries to
reopen one of those statistics cores before the lock was released by the
previous instance, ultimately resulting in a problem where DSpace
temporarily can't see statistics in any previous year cores until we
restart Tomcat again.

The systems administrators here strongly support the decision to allow Solr
to be placed on a different host, and that's what they're doing. My
understanding is that separating things this way will help us to get better
at identifying the particular bottlenecks (e.g. is the problem solr, or
something else?), allowing them to allocate appropriate resources to each
component when needed, and ultimately letting us be more scientific about
performance. Also, it should help us comply with organizational security
requirements faster. If there are any emergency security patches for solr,
the admins can handle them within their normally scheduled update times.

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/CAOgKO9nXVyShB_h6-a5X-DbpVCuH3Z5QAzGXvM7zHMN%3DQPeECA%40mail.gmail.com.

Reply via email to