Re: Proposal to Include GEODE-7079 in 1.10.0

Udo Kohlmeyer Thu, 15 Aug 2019 13:18:06 -0700

I'm changing my vote to +1 on this issue.

The ONLY reason I'm changing my vote is to add to the cleanliness of thecode of the release. I do 100% disagree with the continual scope creepthat we have been incurring on this release branch.


--Udo

On 8/15/19 12:34 PM, Dan Smith wrote:

+1 to merging Juan's fix for GEODE-7079. I've seen systems taken down by
rapidly filling up the logs in the past, this does seem to be a critical
fix from the perspective of system stability.

Also, this is the change, which doesn't seem particularly risky to me.

-          ConflationKey key = new
ConflationKey(gsEvent.getRegion().getFullPath(),
+          ConflationKey key = new ConflationKey(gsEvent.getRegionPath(),

-Dan

On Thu, Aug 15, 2019 at 12:23 PM Udo Kohlmeyer <u...@apache.com> wrote:

Whilst I agree with "*finish* when we believe the quality of the release
branch is sufficient", I disagree that we should have cut a branch and
continue to patch that branch with non-critical fixes. i.e this issue
has been around for a while and has no averse side effects. Issues like
GEODE-7081, which is new due to a new commit, AND it has critical
stability implications on the server, that I can agree we should include
in a potential release branch.

Otherwise we can ALWAYS argue that said release branch is not of
"sufficient" quality, especially if there are numerous existing JIRA's
pertaining to bugs already in the system.

To quote Juan's original email:

/"Note: *no events are lost (even without the fix)* but, if the region
takes//
//a while to recover, the logs  for the member can grow pretty quickly
due to//
//the continuously thrown *NPEs.*"/

In addition to this, if there is a commit in a cut release branch, which
is requiring us to continuously patching the release branch, in order to
stabilize that feature/fix, maybe we should consider reverting that fix
and release it at a later stage, when it is believed that this fix is
more stable and have better, more comprehensive test coverage.

So far, GEODE-7081, does not have me convinced that it is critical. OR
maybe it is the latter of my options, where it is a stabilization commit
to a new feature, which begs the question, should we have accepted the
original feature commit if there are all manner of side effects which we
are only discovering.

--Udo

On 8/15/19 11:08 AM, Anthony Baker wrote:

While we can’t fix *all known bugs*, I think where we do have a fix for

an important issue we should think hard about the cost of not including
that in a release.

IMO, the fixed time approach to releases means that we *start* the

release effort (including stabilization and bug fixing if needed) on a
known date and we *finish* when new believe the quality of the release
branch is sufficient.  Given the number of important fixes being requested,
I’m not sure we are there yet.

I think the release branch concept has merit because it allows us to

isolate ongoing work from the changes needed for a release.

+1 for including GEODE-7079.

Anthony

On Aug 15, 2019, at 10:51 AM, Udo Kohlmeyer <ukohlme...@gmail.com>

wrote:

Seems everyone is in favor or including a /*non-critical*/ fix to an

already cut branch of the a potential release...

Am I missing something?

Why cut a release at all... just have a perpetual cycle of fixes added

to develop and users can chose what nightly snapshot build they would want
to use..

I'm voting -1 on a non-critical issue, which is existing and worst

effect is to fill logs will NPE logs... (yes, not something we want).

I believed that we (as a Geode community) agreed that once a release

has been cut, only critical issue fixes will be included. If we continue
just continually adding to the ALREADY CUT 1.10 release, where do we stop
and when do we release...

--Udo

On 8/15/19 10:19 AM, Nabarun Nag wrote:

+1

On Thu, Aug 15, 2019 at 10:15 AM Alexander Murmann <

amurm...@apache.org>

wrote:

+1

Agreed to fixing this. It's impossible for a user to discover they

hit an

edge case that we fail to support till they are in prod and restart.

On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos <jra...@pivotal.io>
wrote:

Hello Udo,

Even if it is an existing issue I'd still consider it critical for

those

cases on which there are unprocessed events on the persistent queue

after a

restart and the region takes long to recover... you can actually see
millions of *NPEs* flooding the member's logs.
My two cents anyway, it's up to the community to make the final

decision.

Cheers.


On Thu, Aug 15, 2019 at 5:58 PM Udo Kohlmeyer <u...@apache.com>

wrote:

Juan,

   From your explanation, it seems this issue is existing and not
critical. Could we possibly hold this for 1.11?

--Udo

On 8/15/19 5:29 AM, Ju@N wrote:

Hello team,

I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* in

release

1.10.0.
Long story short: a *NullPointerException* can be continuously

thrown

and flood the member's logs if a serial event processor (either
*async-event-queue* or *gateway-sender*) starts processing events

from

recovered persistent queue before the actual region to which it was
attached is fully operational.
Note: *no events are lost (even without the fix)* but, if the

region

takes

a while to recover, the logs  for the member can grow pretty

quickly

due

to

the continuously thrown *NPEs.*
Best regards.

[1]:

https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b

[2]: https://issues.apache.org/jira/browse/GEODE-7079

--
Juan José Ramos Cassella
Senior Software Engineer
Email: jra...@pivotal.io

Re: Proposal to Include GEODE-7079 in 1.10.0

Reply via email to