[Evergreen-dev] Re: When to go to 4.0

Jason Stephenson via Evergreen-dev Thu, 10 Jul 2025 07:24:58 -0700

Hi, all.

I agree with Mike, but with fewer reasons and less explanation. :)


I think we ought to call the next release 3.16, and retartget any 4.0 bug
that have code committed. I am willing to do the latter job.

Jason


On Wed, Jul 9, 2025 at 12:45 PM Mike Rylander via Evergreen-dev <
evergreen-dev@list.evergreen-ils.org> wrote:

> FWIW, I'm -1 on calling the next release 4.0 as of today, because the
> biggest planned change is probably the breaking-est -- the merge of
> OpenSRF and the xmpp-to-redis change -- and it's just not ready yet.
>
> I'll say up front that if we /don't/ merge OpenSRF into EG before the
> next release (and IMO we should not, based on the state of things
> today), and therefore force Redis, but we still want to call it 4.0
> for other big reasons, I would definitely soften my -1 to -0.5 or
> less.
>
> If you don't care much about the details of the Redis stuff, that -^
> is my top line thought on the  "should we call it 4.0" question, and
> you can ignore the rest of my rant! ;)
>
> -------
>
> I've been working on the opensrf-on-redis infrastructure for the last
> month or so with the goal of bringing back the HA and LB functionality
> that we got for free with XMPP.
>
> TL;DR: I'm close, but because of inherent foundational differences in
> the design and purpose of XMPP vs Redis, our code will simply have to
> be more complicated going forward.
>
> IMO, the major issues in (and the state of my changes compared to)
> origin/main of the opensrf repo, re redis are:
>
> * It's extremely complicated and labor intensive (and maybe
> impossible, but I only tried to make it work for a couple days) to
> configure multiple, separate but interacting OpenSRF domains across
> different Redis servers.  At the other end of the spectrum, it's also
> impossible to configure multi-tenant redis servers.
>     -- This is mainly a /configuration capabilities/ issue, not
> primarily a code issue, because Bill did add OpenSRF usernames and
> domains (xmpp domains, before; hosts that run redis, now) to the redis
> keys used by EG.  The structure of the keys is not future-proof and
> doesn't follow redis key space pattern recommendations (at least WRT
> planning for Redis-level clustering, HA, and LB), but since it exists
> today we should be able to change the key structure later at a
> breaking upgrade event (or, whenever we want, if OpenSRF is merged
> into EG).  However, having the "bus" account configuration duplicated
> externally, and configured using a single static file, is not tenable.
>     ++ I've addressed this by adjusting the redis config requirements
> a little, and providing three new configuration modes, targeting use
> cases of different complexity/need:
>       1) Instead of leaving the redis server open and unprotected by
> default and trying to find the password in the "bus accounts" file,
> the Redis "requirepass" setting is used to supply the password for the
> "default" (admin/root/whatever) user.
>       2) osrf_control can receive that password from
>         a) the REDISCLI_AUTH env variable -- generally securable from
> outside.
>         b) a dedicated file's content -- at least the file can be
> locked down to a specific unix user.
>         c) a command line option -- meh, handy for manual use, but
> shows up in `ps`.
>         d) extracted from the "bus accounts file" from before, for
> back-compat.
>       3) Made configuring Redis users/ACLs more flexible:
>         a) the existing "bus accounts file" mechanism continues to
> exist, but because the same file is applied to each domain it's not
> safe for an HA/LB env because it it's not domain- or user-aware.
>         b) a TT2 template can be supplied; it is processed for each
> domain separately, so complicated setups can be encoded in the
> template -- this is intended to provide an HA/LB-safe version of (a).
>         c) osrf_control can dynamically create the necessary ACLs for
> the router, service, client, and gateway users and keys specific to
> each domain -- this is the mechanism that has the broadest set of use
> cases, I think.
>         d) OpenSRF can be told that Redis' built in ACL infrastructure
> (the "aclfile" Redis config file setting, and friends) will just
> handle it, and a bus reset request just issues an "ACL LOAD" command
> to tell redis to refresh ACLs in its native way -- this mechanism
> provides the most logical separation, and I think will be useful in
> highly controlled/automated environments that want to make use of the
> Redis-developer-intended tools for ACL config.
>
>  * LB (cross-registration of OpenSRF domains) does not work
>     -- The register and unregister commands add additional instances
> to an internal list of endpoints for each service, but the router
> always uses the first entry in the list.  The effect is that all
> traffic gets shoveled to the first-registered instance (not
> necessarily the local one, mind) until that instance actively
> deregisters, then it moves to the next one that registered.
>     ++ I've added list rotation. That works and is an obvious fix, of
> course, but it points out that the code is definitely not fully baked
> or feature-tested, and it's lacking existing fault tolerance at an
> infrastructure level.
>
>  * HA does not work, and LB (when fixed as above) is not safe
>     -- Even after addressing the LB part of the cross-registration
> functionality, there is no way to detect that a service instance
> previously registered is no longer available and should be removed
> from the delivery list.  Because we're using redis LISTs to stand in
> for (effectively) stateful TCP sockets and receive buffers, we end up
> just tossing requests into the void and hoping that someone comes
> along to service them.  Put another way, if a listener dies, we have
> no way of detecting that at the OpenSRF level and accounting for the
> failure.  This makes LB /more/ dangerous: think something akin to
> split-brain DNS problems.  Because we can't trust either our internal
> state or the message delivery information from redis.  This is also
> something that we got 100% for free in XMPP, because message delivery
> to an actual endpoint was verified and we got an error when that
> failed, so we could resend to another service instance.  Now the
> message just falls into the void on a LIST key that nobody is looking
> at.
>     ++ I'm working on moving from LISTs to STREAMs for router and
> service keys. Other than the slight difference in surface-level
> commands, it's no harder to use streams than lists.  What this will
> allow us to do is recheck the state of previously sent messages, and
> if 1) they're "stale" and 2) no service instance has claimed them for
> processing, we can retract the message from the stream, deregister the
> service instance behind the redis key on which the message went stale,
> and send it to another service instance.  I have the baseline change
> from LISTs to STREAMs working now, modulo some debug-logging cleanup
> and chasing down a couple possible leaks and corner cases, but the
> redis docs are fighting me at every step. (Just ask separately if you
> want to hear more about that.)  I also have a proof of concept version
> of the message retraction and resend code, but I really want to
> rewrite that using what I've learned (*sad face*) in the last few
> weeks about redis.
>
>  * Infrastructure-level clustering isn't possible
>   -- Whether ejabberd or Redis, infrastructure clustering (transparent
> HA at the infrastructure level) isn't "easy", and the hard parts have
> to live somewhere... In the XMPP world, that was mostly ejabberd's
> problem and it handled it well.  Redis has the concept of clustering,
> but (so far) we've chosen to not only ignore that, but to construct
> things in such a way that the redis cluster stuff /cannot be used
> effectively/.  I have no proof-of-concept code to address this, yet.
> We may never have the option to configure things to be as
> transparently robust in the redis world as we do today with ejabberd.
> That may not matter to most people most of the time, but it's a point
> I feel compelled to raise because it's definitely a loss to admins of
> large, complex, heavily automated installations (even if they're not
> aware of that loss).
>
> I'll be pushing up a branch covering the first two points this week or
> next, and hopefully be able to follow up with the HA fixes ASAP.
>
> Thanks for following my rant this far... :)
>
> --
> Mike Rylander
> Research and Development Manager
> Equinox Open Library Initiative
> 1-877-OPEN-ILS (673-6457)
> work: mi...@equinoxoli.org
> personal: mrylan...@gmail.com
> https://equinoxOLI.org
>
> On Tue, Jul 8, 2025 at 7:22 PM Jeff Davis via Evergreen-dev
> <evergreen-dev@list.evergreen-ils.org> wrote:
> >
> > We've been talking about calling our next major release Evergreen 4.0,
> rather than 3.16.
> >
> > Is there a list of features that we want to include in a 4.0 release?
> Should we hold off on bumping the version number to 4.0 until those
> features are ready?
> >
> > Some candidates for "features that warrant going to 4.0":
> > - Making Angular circ the standard circ UI, rather than experimental. My
> understanding is that we don't expect that to happen in the next release.
> > - Merging OpenSRF into Evergreen (LP#2032835). We were waiting to
> replace ejabberd with Redis before doing that; Redis is now supported in
> Evergreen, but I don't know if anyone has revisited merging OpenSRF into EG
> since then.
> > - There are a number of bugs targeted to "4.0-beta" in Launchpad, but
> AFAIK they are just targeting the next major release, whether it's called
> 4.0 or not.
> >
> > Any opinions? I would prefer to reserve "4.0" for a release that is
> somehow "more" than just the next major release, but I recognize that
> version numbering is basically arbitrary.
> > --
> > Jeff Davis
> > BC Libraries Cooperative
> > _______________________________________________
> > Evergreen-dev mailing list -- evergreen-dev@list.evergreen-ils.org
> > To unsubscribe send an email to
> evergreen-dev-le...@list.evergreen-ils.org
> _______________________________________________
> Evergreen-dev mailing list -- evergreen-dev@list.evergreen-ils.org
> To unsubscribe send an email to evergreen-dev-le...@list.evergreen-ils.org
>


-- 

Jason Stephenson (he/him)
ILS Manager, C/W MARS, Inc.

------------------------------

[image: icon] jstephen...@cwmars.org | [image: icon]www.cwmars.org

[image: icon] 508-755-3323 x 418

_______________________________________________
Evergreen-dev mailing list -- evergreen-dev@list.evergreen-ils.org
To unsubscribe send an email to evergreen-dev-le...@list.evergreen-ils.org

[Evergreen-dev] Re: When to go to 4.0

Reply via email to