+1

This will be a good inclusion in Apache Geode 1.10.0 release.

Regards
Naba

On Mon, Aug 26, 2019 at 3:36 PM Jacob Barrett <jbarr...@pivotal.io> wrote:

> +1
>
> Thanks for the details!
>
> > On Aug 26, 2019, at 3:33 PM, Ryan McMahon <rmcma...@pivotal.io> wrote:
> >
> > Udo,
> >
> > Here are inline answers to your questions:
> >
> > *Is this an existing issue?*
> >
> > Short answer - yes, but it has never been in a release version of Geode.
> > The leak was introduced as part of some changes to address handling
> > multiple concurrent registration requests for a given client on a single
> > server.  The issue is only seen if client registration fails which is not
> > particularly common, which is why we are only seeing it now after months
> of
> > testing.  The commit for that was introduced here on April 30th.
> >
> https://github.com/apache/geode/commit/bc2a2fa5af374cfedfba4dc1abe6cbc2a7b719c8
> > The ConcurrentModificationException issue (which ultimately causes the
> > registration to fail) was introduced on April 22nd here:
> >
> https://github.com/apache/geode/commit/afc311c04f6908a8f725834cdf2c49ce6e902b3f
> >
> >
> > *Why is it more critical to squeeze it into an existing (almost
> > release) version of Apache Geode?*
> >
> > Not sure I totally understand this question, but it is critical because
> the
> > leak can cause servers to crash due to OOM.  Again, because the problems
> > were introduced in late April and we haven't released Geode since then,
> so
> > I think it is very important to merge these fixes into 1.10.0 before we
> > release.
> >
> >
> >
> > *What guarantees do we have that this fix makes the application more
> stable
> > compared to adding another hidden issue, which we will discover in
> another
> > few weeks from now?*
> >
> > I added numerous tests for this scenario to ensure that the leak would
> > never happen regardless of the cause of a failed client registration.
> > There obviously is no way to 100% guarantee that there will be no issues
> > that arise from the fixes themselves, but our existing test coverage plus
> > the new tests I wrote should give us the confidence we need.
> >
> > Thanks,
> > Ryan
> >
> > On Mon, Aug 26, 2019 at 3:17 PM Udo Kohlmeyer <u...@apache.com> wrote:
> >
> >> In order to better understand this request:
> >>
> >> Is this an existing issue?
> >>
> >> Why is it more critical to squeeze it into an existing (almost release)
> >> version of Apache Geode?
> >>
> >> What guarantees do we have that this fix makes the application more
> >> stable compared to adding another hidden issue, which we will discover
> >> in another few weeks from now?
> >>
> >>
> >> --Udo
> >>
> >> On 8/26/19 3:10 PM, Ryan McMahon wrote:
> >>> Hi all,
> >>>
> >>> I would like to propose cherry-picking GEODE-7088 and GEODE-7089 to the
> >>> 1.10.0 release branch.  The two JIRAs are related to the same root
> >> problem,
> >>> which I would classify as critical.  We discovered a case where a
> failed
> >>> client registration could lead to a memory leak in a server, eventually
> >>> causing the server to crash due to lack of memory.
> >>>
> >>> The issue is instigated by a ConcurrentModificationException due to
> >>> iteration of a non-thread safe collection while it is being mutated
> >>> (GEODE-7088).  This exception occurs when the client's queue image is
> >> being
> >>> copied from one server to the next during client registration, and it
> >>> causes the client's registration to fail.  The client would likely
> >> succeed
> >>> if it retried registration with that same server, but if it registers
> >> with
> >>> a different server, we end up leaking events to the client's
> registration
> >>> queue on the original server (GEODE-7089).
> >>>
> >>> The fix for GEODE-7088 is to use thread-safe collections for interested
> >>> clients in client update messages.  The fix for GEODE-7089 is to always
> >>> drain and remove the registration queue regardless of success or
> failure.
> >>> Together, these fixes prevent the failed registrations and memory leak.
> >>>
> >>> The SHAs for the fixes and tests in develop are:
> >>>
> >>> GEODE-7088
> >>> - 174af1d23fb7e09eb2bc2fa55479df854850fadb
> >>> - 5bb753a8f4ff2886acd8e62d6f51fea58e37881d
> >>>
> >>> GEODE-7089
> >>> - 5d0153ad4adb1612a1083673f98b1982819a6589
> >>>
> >>> This proposal is to cherry-pick these commits to 1.10.0 release branch.
> >>>
> >>> Thanks,
> >>> Ryan McMahon
> >>>
> >>
>
>

Reply via email to