I'm changing my vote to +1 on this issue.

The ONLY reason I'm changing my vote is to add to the cleanliness of the code of the release. I do 100% disagree with the continual scope creep that we have been incurring on this release branch.

--Udo

On 8/15/19 12:34 PM, Dan Smith wrote:
+1 to merging Juan's fix for GEODE-7079. I've seen systems taken down by
rapidly filling up the logs in the past, this does seem to be a critical
fix from the perspective of system stability.

Also, this is the change, which doesn't seem particularly risky to me.

-          ConflationKey key = new
ConflationKey(gsEvent.getRegion().getFullPath(),
+          ConflationKey key = new ConflationKey(gsEvent.getRegionPath(),

-Dan

On Thu, Aug 15, 2019 at 12:23 PM Udo Kohlmeyer <u...@apache.com> wrote:

Whilst I agree with "*finish* when we believe the quality of the release
branch is sufficient", I disagree that we should have cut a branch and
continue to patch that branch with non-critical fixes. i.e this issue
has been around for a while and has no averse side effects. Issues like
GEODE-7081, which is new due to a new commit, AND it has critical
stability implications on the server, that I can agree we should include
in a potential release branch.

Otherwise we can ALWAYS argue that said release branch is not of
"sufficient" quality, especially if there are numerous existing JIRA's
pertaining to bugs already in the system.

To quote Juan's original email:

/"Note: *no events are lost (even without the fix)* but, if the region
takes//
//a while to recover, the logs  for the member can grow pretty quickly
due to//
//the continuously thrown *NPEs.*"/

In addition to this, if there is a commit in a cut release branch, which
is requiring us to continuously patching the release branch, in order to
stabilize that feature/fix, maybe we should consider reverting that fix
and release it at a later stage, when it is believed that this fix is
more stable and have better, more comprehensive test coverage.

So far, GEODE-7081, does not have me convinced that it is critical. OR
maybe it is the latter of my options, where it is a stabilization commit
to a new feature, which begs the question, should we have accepted the
original feature commit if there are all manner of side effects which we
are only discovering.

--Udo

On 8/15/19 11:08 AM, Anthony Baker wrote:
While we can’t fix *all known bugs*, I think where we do have a fix for
an important issue we should think hard about the cost of not including
that in a release.
IMO, the fixed time approach to releases means that we *start* the
release effort (including stabilization and bug fixing if needed) on a
known date and we *finish* when new believe the quality of the release
branch is sufficient.  Given the number of important fixes being requested,
I’m not sure we are there yet.
I think the release branch concept has merit because it allows us to
isolate ongoing work from the changes needed for a release.
+1 for including GEODE-7079.

Anthony


On Aug 15, 2019, at 10:51 AM, Udo Kohlmeyer <ukohlme...@gmail.com>
wrote:
Seems everyone is in favor or including a /*non-critical*/ fix to an
already cut branch of the a potential release...
Am I missing something?

Why cut a release at all... just have a perpetual cycle of fixes added
to develop and users can chose what nightly snapshot build they would want
to use..
I'm voting -1 on a non-critical issue, which is existing and worst
effect is to fill logs will NPE logs... (yes, not something we want).
I believed that we (as a Geode community) agreed that once a release
has been cut, only critical issue fixes will be included. If we continue
just continually adding to the ALREADY CUT 1.10 release, where do we stop
and when do we release...
--Udo

On 8/15/19 10:19 AM, Nabarun Nag wrote:
+1

On Thu, Aug 15, 2019 at 10:15 AM Alexander Murmann <
amurm...@apache.org>
wrote:

+1

Agreed to fixing this. It's impossible for a user to discover they
hit an
edge case that we fail to support till they are in prod and restart.

On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos <jra...@pivotal.io>
wrote:

Hello Udo,

Even if it is an existing issue I'd still consider it critical for
those
cases on which there are unprocessed events on the persistent queue
after a
restart and the region takes long to recover... you can actually see
millions of *NPEs* flooding the member's logs.
My two cents anyway, it's up to the community to make the final
decision.
Cheers.


On Thu, Aug 15, 2019 at 5:58 PM Udo Kohlmeyer <u...@apache.com>
wrote:
Juan,

   From your explanation, it seems this issue is existing and not
critical. Could we possibly hold this for 1.11?

--Udo

On 8/15/19 5:29 AM, Ju@N wrote:
Hello team,

I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* in
release
1.10.0.
Long story short: a *NullPointerException* can be continuously
thrown
and flood the member's logs if a serial event processor (either
*async-event-queue* or *gateway-sender*) starts processing events
from
a
recovered persistent queue before the actual region to which it was
attached is fully operational.
Note: *no events are lost (even without the fix)* but, if the
region
takes
a while to recover, the logs  for the member can grow pretty
quickly
due
to
the continuously thrown *NPEs.*
Best regards.

[1]:

https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b
[2]: https://issues.apache.org/jira/browse/GEODE-7079

--
Juan José Ramos Cassella
Senior Software Engineer
Email: jra...@pivotal.io

Reply via email to