Hi, all. I agree with Mike, but with fewer reasons and less explanation. :)
I think we ought to call the next release 3.16, and retartget any 4.0 bug that have code committed. I am willing to do the latter job. Jason On Wed, Jul 9, 2025 at 12:45 PM Mike Rylander via Evergreen-dev < evergreen-dev@list.evergreen-ils.org> wrote: > FWIW, I'm -1 on calling the next release 4.0 as of today, because the > biggest planned change is probably the breaking-est -- the merge of > OpenSRF and the xmpp-to-redis change -- and it's just not ready yet. > > I'll say up front that if we /don't/ merge OpenSRF into EG before the > next release (and IMO we should not, based on the state of things > today), and therefore force Redis, but we still want to call it 4.0 > for other big reasons, I would definitely soften my -1 to -0.5 or > less. > > If you don't care much about the details of the Redis stuff, that -^ > is my top line thought on the "should we call it 4.0" question, and > you can ignore the rest of my rant! ;) > > ------- > > I've been working on the opensrf-on-redis infrastructure for the last > month or so with the goal of bringing back the HA and LB functionality > that we got for free with XMPP. > > TL;DR: I'm close, but because of inherent foundational differences in > the design and purpose of XMPP vs Redis, our code will simply have to > be more complicated going forward. > > IMO, the major issues in (and the state of my changes compared to) > origin/main of the opensrf repo, re redis are: > > * It's extremely complicated and labor intensive (and maybe > impossible, but I only tried to make it work for a couple days) to > configure multiple, separate but interacting OpenSRF domains across > different Redis servers. At the other end of the spectrum, it's also > impossible to configure multi-tenant redis servers. > -- This is mainly a /configuration capabilities/ issue, not > primarily a code issue, because Bill did add OpenSRF usernames and > domains (xmpp domains, before; hosts that run redis, now) to the redis > keys used by EG. The structure of the keys is not future-proof and > doesn't follow redis key space pattern recommendations (at least WRT > planning for Redis-level clustering, HA, and LB), but since it exists > today we should be able to change the key structure later at a > breaking upgrade event (or, whenever we want, if OpenSRF is merged > into EG). However, having the "bus" account configuration duplicated > externally, and configured using a single static file, is not tenable. > ++ I've addressed this by adjusting the redis config requirements > a little, and providing three new configuration modes, targeting use > cases of different complexity/need: > 1) Instead of leaving the redis server open and unprotected by > default and trying to find the password in the "bus accounts" file, > the Redis "requirepass" setting is used to supply the password for the > "default" (admin/root/whatever) user. > 2) osrf_control can receive that password from > a) the REDISCLI_AUTH env variable -- generally securable from > outside. > b) a dedicated file's content -- at least the file can be > locked down to a specific unix user. > c) a command line option -- meh, handy for manual use, but > shows up in `ps`. > d) extracted from the "bus accounts file" from before, for > back-compat. > 3) Made configuring Redis users/ACLs more flexible: > a) the existing "bus accounts file" mechanism continues to > exist, but because the same file is applied to each domain it's not > safe for an HA/LB env because it it's not domain- or user-aware. > b) a TT2 template can be supplied; it is processed for each > domain separately, so complicated setups can be encoded in the > template -- this is intended to provide an HA/LB-safe version of (a). > c) osrf_control can dynamically create the necessary ACLs for > the router, service, client, and gateway users and keys specific to > each domain -- this is the mechanism that has the broadest set of use > cases, I think. > d) OpenSRF can be told that Redis' built in ACL infrastructure > (the "aclfile" Redis config file setting, and friends) will just > handle it, and a bus reset request just issues an "ACL LOAD" command > to tell redis to refresh ACLs in its native way -- this mechanism > provides the most logical separation, and I think will be useful in > highly controlled/automated environments that want to make use of the > Redis-developer-intended tools for ACL config. > > * LB (cross-registration of OpenSRF domains) does not work > -- The register and unregister commands add additional instances > to an internal list of endpoints for each service, but the router > always uses the first entry in the list. The effect is that all > traffic gets shoveled to the first-registered instance (not > necessarily the local one, mind) until that instance actively > deregisters, then it moves to the next one that registered. > ++ I've added list rotation. That works and is an obvious fix, of > course, but it points out that the code is definitely not fully baked > or feature-tested, and it's lacking existing fault tolerance at an > infrastructure level. > > * HA does not work, and LB (when fixed as above) is not safe > -- Even after addressing the LB part of the cross-registration > functionality, there is no way to detect that a service instance > previously registered is no longer available and should be removed > from the delivery list. Because we're using redis LISTs to stand in > for (effectively) stateful TCP sockets and receive buffers, we end up > just tossing requests into the void and hoping that someone comes > along to service them. Put another way, if a listener dies, we have > no way of detecting that at the OpenSRF level and accounting for the > failure. This makes LB /more/ dangerous: think something akin to > split-brain DNS problems. Because we can't trust either our internal > state or the message delivery information from redis. This is also > something that we got 100% for free in XMPP, because message delivery > to an actual endpoint was verified and we got an error when that > failed, so we could resend to another service instance. Now the > message just falls into the void on a LIST key that nobody is looking > at. > ++ I'm working on moving from LISTs to STREAMs for router and > service keys. Other than the slight difference in surface-level > commands, it's no harder to use streams than lists. What this will > allow us to do is recheck the state of previously sent messages, and > if 1) they're "stale" and 2) no service instance has claimed them for > processing, we can retract the message from the stream, deregister the > service instance behind the redis key on which the message went stale, > and send it to another service instance. I have the baseline change > from LISTs to STREAMs working now, modulo some debug-logging cleanup > and chasing down a couple possible leaks and corner cases, but the > redis docs are fighting me at every step. (Just ask separately if you > want to hear more about that.) I also have a proof of concept version > of the message retraction and resend code, but I really want to > rewrite that using what I've learned (*sad face*) in the last few > weeks about redis. > > * Infrastructure-level clustering isn't possible > -- Whether ejabberd or Redis, infrastructure clustering (transparent > HA at the infrastructure level) isn't "easy", and the hard parts have > to live somewhere... In the XMPP world, that was mostly ejabberd's > problem and it handled it well. Redis has the concept of clustering, > but (so far) we've chosen to not only ignore that, but to construct > things in such a way that the redis cluster stuff /cannot be used > effectively/. I have no proof-of-concept code to address this, yet. > We may never have the option to configure things to be as > transparently robust in the redis world as we do today with ejabberd. > That may not matter to most people most of the time, but it's a point > I feel compelled to raise because it's definitely a loss to admins of > large, complex, heavily automated installations (even if they're not > aware of that loss). > > I'll be pushing up a branch covering the first two points this week or > next, and hopefully be able to follow up with the HA fixes ASAP. > > Thanks for following my rant this far... :) > > -- > Mike Rylander > Research and Development Manager > Equinox Open Library Initiative > 1-877-OPEN-ILS (673-6457) > work: mi...@equinoxoli.org > personal: mrylan...@gmail.com > https://equinoxOLI.org > > On Tue, Jul 8, 2025 at 7:22 PM Jeff Davis via Evergreen-dev > <evergreen-dev@list.evergreen-ils.org> wrote: > > > > We've been talking about calling our next major release Evergreen 4.0, > rather than 3.16. > > > > Is there a list of features that we want to include in a 4.0 release? > Should we hold off on bumping the version number to 4.0 until those > features are ready? > > > > Some candidates for "features that warrant going to 4.0": > > - Making Angular circ the standard circ UI, rather than experimental. My > understanding is that we don't expect that to happen in the next release. > > - Merging OpenSRF into Evergreen (LP#2032835). We were waiting to > replace ejabberd with Redis before doing that; Redis is now supported in > Evergreen, but I don't know if anyone has revisited merging OpenSRF into EG > since then. > > - There are a number of bugs targeted to "4.0-beta" in Launchpad, but > AFAIK they are just targeting the next major release, whether it's called > 4.0 or not. > > > > Any opinions? I would prefer to reserve "4.0" for a release that is > somehow "more" than just the next major release, but I recognize that > version numbering is basically arbitrary. > > -- > > Jeff Davis > > BC Libraries Cooperative > > _______________________________________________ > > Evergreen-dev mailing list -- evergreen-dev@list.evergreen-ils.org > > To unsubscribe send an email to > evergreen-dev-le...@list.evergreen-ils.org > _______________________________________________ > Evergreen-dev mailing list -- evergreen-dev@list.evergreen-ils.org > To unsubscribe send an email to evergreen-dev-le...@list.evergreen-ils.org > -- Jason Stephenson (he/him) ILS Manager, C/W MARS, Inc. ------------------------------ [image: icon] jstephen...@cwmars.org | [image: icon]www.cwmars.org [image: icon] 508-755-3323 x 418
_______________________________________________ Evergreen-dev mailing list -- evergreen-dev@list.evergreen-ils.org To unsubscribe send an email to evergreen-dev-le...@list.evergreen-ils.org