[Evergreen-dev] Re: When to go to 4.0

Terran McCanna via Evergreen-dev Thu, 10 Jul 2025 12:13:29 -0700

Thank you, Galen!

On Thu, Jul 10, 2025 at 12:06 PM Galen Charlton via Evergreen-dev <
[email protected]> wrote:


> Hi,
>
> Series 3.16 and milestone 3.16-beta are now set up in Launchpad via
> renaming 4.0/4.0-beta.
>
> Regards,
>
> Galen
>
> On Thu, Jul 10, 2025 at 11:13 AM Rogan Hamby <[email protected]>
> wrote:
>
>> I concur that something labeled 4.0 should be very end user visible. Part
>> of the value of a major version release is that it can be promoted as a
>> project milestone in its maturity and it takes a lot of wind out of the
>> sails to say "you can't see any of it but trust us, it's cool."
>>
>>
>>
>> On Thu, Jul 10, 2025 at 11:00 AM Galen Charlton via Evergreen-dev <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> Unless somebody really wants to advocate for calling the next release
>>> 4.0 - and there's been no sign thus far - let's consider the matter
>>> decided: we'll call the next release 3.16.
>>>
>>> I note that Launchpad will allow simply renaming the 4.0 series to 3.16
>>> and the 4.0-beta milestone to 3.16-beta, so I suspect that little, if no
>>> actual retargeting of bugs will be necessary
>>>
>>> I will make those changes around 12 p.m. ET today.
>>>
>>> As a final comment, I suggest that since we are leaning towards treating
>>> 4.0 as a big-splash release, that the splash be something that is directly
>>> visible to end users. (In other words, I don't think that OpenSRF-related
>>> changes alone would count, though that is only a weakly-held opinion).
>>>
>>> Regards,
>>>
>>> Galen
>>>
>>> On Thu, Jul 10, 2025 at 10:24 AM Jason Stephenson via Evergreen-dev <
>>> [email protected]> wrote:
>>>
>>>> Hi, all.
>>>>
>>>> I agree with Mike, but with fewer reasons and less explanation. :)
>>>>
>>>> I think we ought to call the next release 3.16, and retartget any 4.0
>>>> bug that have code committed. I am willing to do the latter job.
>>>>
>>>> Jason
>>>>
>>>>
>>>> On Wed, Jul 9, 2025 at 12:45 PM Mike Rylander via Evergreen-dev <
>>>> [email protected]> wrote:
>>>>
>>>>> FWIW, I'm -1 on calling the next release 4.0 as of today, because the
>>>>> biggest planned change is probably the breaking-est -- the merge of
>>>>> OpenSRF and the xmpp-to-redis change -- and it's just not ready yet.
>>>>>
>>>>> I'll say up front that if we /don't/ merge OpenSRF into EG before the
>>>>> next release (and IMO we should not, based on the state of things
>>>>> today), and therefore force Redis, but we still want to call it 4.0
>>>>> for other big reasons, I would definitely soften my -1 to -0.5 or
>>>>> less.
>>>>>
>>>>> If you don't care much about the details of the Redis stuff, that -^
>>>>> is my top line thought on the  "should we call it 4.0" question, and
>>>>> you can ignore the rest of my rant! ;)
>>>>>
>>>>> -------
>>>>>
>>>>> I've been working on the opensrf-on-redis infrastructure for the last
>>>>> month or so with the goal of bringing back the HA and LB functionality
>>>>> that we got for free with XMPP.
>>>>>
>>>>> TL;DR: I'm close, but because of inherent foundational differences in
>>>>> the design and purpose of XMPP vs Redis, our code will simply have to
>>>>> be more complicated going forward.
>>>>>
>>>>> IMO, the major issues in (and the state of my changes compared to)
>>>>> origin/main of the opensrf repo, re redis are:
>>>>>
>>>>> * It's extremely complicated and labor intensive (and maybe
>>>>> impossible, but I only tried to make it work for a couple days) to
>>>>> configure multiple, separate but interacting OpenSRF domains across
>>>>> different Redis servers.  At the other end of the spectrum, it's also
>>>>> impossible to configure multi-tenant redis servers.
>>>>>     -- This is mainly a /configuration capabilities/ issue, not
>>>>> primarily a code issue, because Bill did add OpenSRF usernames and
>>>>> domains (xmpp domains, before; hosts that run redis, now) to the redis
>>>>> keys used by EG.  The structure of the keys is not future-proof and
>>>>> doesn't follow redis key space pattern recommendations (at least WRT
>>>>> planning for Redis-level clustering, HA, and LB), but since it exists
>>>>> today we should be able to change the key structure later at a
>>>>> breaking upgrade event (or, whenever we want, if OpenSRF is merged
>>>>> into EG).  However, having the "bus" account configuration duplicated
>>>>> externally, and configured using a single static file, is not tenable.
>>>>>     ++ I've addressed this by adjusting the redis config requirements
>>>>> a little, and providing three new configuration modes, targeting use
>>>>> cases of different complexity/need:
>>>>>       1) Instead of leaving the redis server open and unprotected by
>>>>> default and trying to find the password in the "bus accounts" file,
>>>>> the Redis "requirepass" setting is used to supply the password for the
>>>>> "default" (admin/root/whatever) user.
>>>>>       2) osrf_control can receive that password from
>>>>>         a) the REDISCLI_AUTH env variable -- generally securable from
>>>>> outside.
>>>>>         b) a dedicated file's content -- at least the file can be
>>>>> locked down to a specific unix user.
>>>>>         c) a command line option -- meh, handy for manual use, but
>>>>> shows up in `ps`.
>>>>>         d) extracted from the "bus accounts file" from before, for
>>>>> back-compat.
>>>>>       3) Made configuring Redis users/ACLs more flexible:
>>>>>         a) the existing "bus accounts file" mechanism continues to
>>>>> exist, but because the same file is applied to each domain it's not
>>>>> safe for an HA/LB env because it it's not domain- or user-aware.
>>>>>         b) a TT2 template can be supplied; it is processed for each
>>>>> domain separately, so complicated setups can be encoded in the
>>>>> template -- this is intended to provide an HA/LB-safe version of (a).
>>>>>         c) osrf_control can dynamically create the necessary ACLs for
>>>>> the router, service, client, and gateway users and keys specific to
>>>>> each domain -- this is the mechanism that has the broadest set of use
>>>>> cases, I think.
>>>>>         d) OpenSRF can be told that Redis' built in ACL infrastructure
>>>>> (the "aclfile" Redis config file setting, and friends) will just
>>>>> handle it, and a bus reset request just issues an "ACL LOAD" command
>>>>> to tell redis to refresh ACLs in its native way -- this mechanism
>>>>> provides the most logical separation, and I think will be useful in
>>>>> highly controlled/automated environments that want to make use of the
>>>>> Redis-developer-intended tools for ACL config.
>>>>>
>>>>>  * LB (cross-registration of OpenSRF domains) does not work
>>>>>     -- The register and unregister commands add additional instances
>>>>> to an internal list of endpoints for each service, but the router
>>>>> always uses the first entry in the list.  The effect is that all
>>>>> traffic gets shoveled to the first-registered instance (not
>>>>> necessarily the local one, mind) until that instance actively
>>>>> deregisters, then it moves to the next one that registered.
>>>>>     ++ I've added list rotation. That works and is an obvious fix, of
>>>>> course, but it points out that the code is definitely not fully baked
>>>>> or feature-tested, and it's lacking existing fault tolerance at an
>>>>> infrastructure level.
>>>>>
>>>>>  * HA does not work, and LB (when fixed as above) is not safe
>>>>>     -- Even after addressing the LB part of the cross-registration
>>>>> functionality, there is no way to detect that a service instance
>>>>> previously registered is no longer available and should be removed
>>>>> from the delivery list.  Because we're using redis LISTs to stand in
>>>>> for (effectively) stateful TCP sockets and receive buffers, we end up
>>>>> just tossing requests into the void and hoping that someone comes
>>>>> along to service them.  Put another way, if a listener dies, we have
>>>>> no way of detecting that at the OpenSRF level and accounting for the
>>>>> failure.  This makes LB /more/ dangerous: think something akin to
>>>>> split-brain DNS problems.  Because we can't trust either our internal
>>>>> state or the message delivery information from redis.  This is also
>>>>> something that we got 100% for free in XMPP, because message delivery
>>>>> to an actual endpoint was verified and we got an error when that
>>>>> failed, so we could resend to another service instance.  Now the
>>>>> message just falls into the void on a LIST key that nobody is looking
>>>>> at.
>>>>>     ++ I'm working on moving from LISTs to STREAMs for router and
>>>>> service keys. Other than the slight difference in surface-level
>>>>> commands, it's no harder to use streams than lists.  What this will
>>>>> allow us to do is recheck the state of previously sent messages, and
>>>>> if 1) they're "stale" and 2) no service instance has claimed them for
>>>>> processing, we can retract the message from the stream, deregister the
>>>>> service instance behind the redis key on which the message went stale,
>>>>> and send it to another service instance.  I have the baseline change
>>>>> from LISTs to STREAMs working now, modulo some debug-logging cleanup
>>>>> and chasing down a couple possible leaks and corner cases, but the
>>>>> redis docs are fighting me at every step. (Just ask separately if you
>>>>> want to hear more about that.)  I also have a proof of concept version
>>>>> of the message retraction and resend code, but I really want to
>>>>> rewrite that using what I've learned (*sad face*) in the last few
>>>>> weeks about redis.
>>>>>
>>>>>  * Infrastructure-level clustering isn't possible
>>>>>   -- Whether ejabberd or Redis, infrastructure clustering (transparent
>>>>> HA at the infrastructure level) isn't "easy", and the hard parts have
>>>>> to live somewhere... In the XMPP world, that was mostly ejabberd's
>>>>> problem and it handled it well.  Redis has the concept of clustering,
>>>>> but (so far) we've chosen to not only ignore that, but to construct
>>>>> things in such a way that the redis cluster stuff /cannot be used
>>>>> effectively/.  I have no proof-of-concept code to address this, yet.
>>>>> We may never have the option to configure things to be as
>>>>> transparently robust in the redis world as we do today with ejabberd.
>>>>> That may not matter to most people most of the time, but it's a point
>>>>> I feel compelled to raise because it's definitely a loss to admins of
>>>>> large, complex, heavily automated installations (even if they're not
>>>>> aware of that loss).
>>>>>
>>>>> I'll be pushing up a branch covering the first two points this week or
>>>>> next, and hopefully be able to follow up with the HA fixes ASAP.
>>>>>
>>>>> Thanks for following my rant this far... :)
>>>>>
>>>>> --
>>>>> Mike Rylander
>>>>> Research and Development Manager
>>>>> Equinox Open Library Initiative
>>>>> 1-877-OPEN-ILS (673-6457)
>>>>> work: [email protected]
>>>>> personal: [email protected]
>>>>> https://equinoxOLI.org
>>>>>
>>>>> On Tue, Jul 8, 2025 at 7:22 PM Jeff Davis via Evergreen-dev
>>>>> <[email protected]> wrote:
>>>>> >
>>>>> > We've been talking about calling our next major release Evergreen
>>>>> 4.0, rather than 3.16.
>>>>> >
>>>>> > Is there a list of features that we want to include in a 4.0
>>>>> release? Should we hold off on bumping the version number to 4.0 until
>>>>> those features are ready?
>>>>> >
>>>>> > Some candidates for "features that warrant going to 4.0":
>>>>> > - Making Angular circ the standard circ UI, rather than
>>>>> experimental. My understanding is that we don't expect that to happen in
>>>>> the next release.
>>>>> > - Merging OpenSRF into Evergreen (LP#2032835). We were waiting to
>>>>> replace ejabberd with Redis before doing that; Redis is now supported in
>>>>> Evergreen, but I don't know if anyone has revisited merging OpenSRF into 
>>>>> EG
>>>>> since then.
>>>>> > - There are a number of bugs targeted to "4.0-beta" in Launchpad,
>>>>> but AFAIK they are just targeting the next major release, whether it's
>>>>> called 4.0 or not.
>>>>> >
>>>>> > Any opinions? I would prefer to reserve "4.0" for a release that is
>>>>> somehow "more" than just the next major release, but I recognize that
>>>>> version numbering is basically arbitrary.
>>>>> > --
>>>>> > Jeff Davis
>>>>> > BC Libraries Cooperative
>>>>> > _______________________________________________
>>>>> > Evergreen-dev mailing list -- [email protected]
>>>>> > To unsubscribe send an email to
>>>>> [email protected]
>>>>> _______________________________________________
>>>>> Evergreen-dev mailing list -- [email protected]
>>>>> To unsubscribe send an email to
>>>>> [email protected]
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Jason Stephenson (he/him)
>>>> ILS Manager, C/W MARS, Inc.
>>>>
>>>> ------------------------------
>>>>
>>>> [image: icon] [email protected] | [image: icon]www.cwmars.org
>>>>
>>>> [image: icon] 508-755-3323 x 418
>>>> _______________________________________________
>>>> Evergreen-dev mailing list -- [email protected]
>>>> To unsubscribe send an email to
>>>> [email protected]
>>>>
>>>
>>>
>>> --
>>> Galen Charlton
>>> Implementation and IT Manager
>>> Equinox Open Library Initiative
>>> [email protected]
>>> https://www.equinoxOLI.org
>>> phone: 877-OPEN-ILS (673-6457)
>>> direct: 770-709-5581
>>> <http://evergreen-ils.org>
>>> _______________________________________________
>>> Evergreen-dev mailing list -- [email protected]
>>> To unsubscribe send an email to
>>> [email protected]
>>>
>>
>
> --
> Galen Charlton
> Implementation and IT Manager
> Equinox Open Library Initiative
> [email protected]
> https://www.equinoxOLI.org
> phone: 877-OPEN-ILS (673-6457)
> direct: 770-709-5581
> <http://evergreen-ils.org>
> _______________________________________________
> Evergreen-dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>

_______________________________________________
Evergreen-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Evergreen-dev] Re: When to go to 4.0

Reply via email to