Given that the primary driver of this release is to fix an issue with the 
misuse of dataDir and dataLogDir I would rather see this release make it out 
the door with minimal additional changes to core functionality so people can 
more confidently upgrade. 

What do you think Pat?

Abe

On Fri, Apr 13, 2018, at 11:37, Alexander Shraer wrote:
> Now that we have the fix, why delay it to next release?
> 
> On Fri, Apr 13, 2018 at 11:09 AM Abraham Fine <af...@apache.org> wrote:
> 
> > Let's wait until the next release to include this fix.
> >
> > On Mon, Apr 9, 2018, at 15:14, Alexander Shraer wrote:
> > > Hi,
> > >
> > > Please take a look on the new PR for ZK-2959:
> > > https://github.com/apache/zookeeper/pull/500
> > > If there are no further comments, I can commit it.
> > >
> > > Thanks,
> > > Alex
> > >
> > > On Fri, Apr 6, 2018 at 11:33 AM, Alexander Shraer <shra...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > The bug described in  ZOOKEEPER-2959
> > > > <https://issues.apache.org/jira/browse/ZOOKEEPER-2959>  is that
> > > > getEpochToPropose an waitForEpochAck do not distinguish between
> > followers
> > > > and observers.
> > > > This can cause a candidate leader's acceptedEpoch to be updated with
> > only
> > > > support from observers. Same for waitForEpochAck - passing this method
> > > > allows the candidate leader to update the currentEpoch. The latter
> > helps
> > > > this server to win FLE elections continuously, and the former
> > > > (acceptedEpoch)
> > > > causes anyone trying to connect to the server to think that it has more
> > > > up-to-date data and trucate their logs to match.
> > > >
> > > >
> > > > Alex
> > > >
> > > > On Fri, Apr 6, 2018 at 10:04 AM, Fangmin Lv <lvfang...@gmail.com>
> > wrote:
> > > >
> > > >> Hi Alex,
> > > >>
> > > >> Can you give more details about the data loss scenario in Jira
> > > >> ZOOKEEPER-2959 <https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> > >?
> > > >> As far as I know, the leader will ignore the observers' ACK in
> > > >> waitForNewLeaderAck, so it will not start serve traffic until it
> > received
> > > >> the actual quorum ACK, if it doesn't have enough followers support
> > before
> > > >> timeout, it will quit leading and it's learners will re-sync with new
> > > >> leader.
> > > >>
> > > >> Thanks,
> > > >> Fangmin
> > > >>
> > > >> On Thu, Apr 5, 2018 at 12:57 PM, Alexander Shraer <shra...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Btw we actually observed the described issue (data loss), thankfully
> > in a
> > > >>> test environment. So I thought this is important to share with the
> > > >>> community.
> > > >>>
> > > >>> Unfortunately I don’t have time to run a new ZK release for this, so
> > I’m
> > > >>> not going to -1 your candidate, but we are actively working on a fix
> > (ie
> > > >>> a
> > > >>> test at this point) and I can commit that as soon as we have that.
> > > >>>
> > > >>> It may be worth while to delay the release by a few more days, but
> > it’s
> > > >>> totally up to you since you’re running it.
> > > >>>
> > > >>> Cheers
> > > >>> Alex
> > > >>> On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar <an...@cloudera.com>
> > wrote:
> > > >>>
> > > >>> > Got that. I still believe it's a completely valid issue which has
> > to be
> > > >>> > addressed, but it's not a showstopper. I'm afraid we're not going
> > to
> > > >>> > convince each other, so it's probably Abe's call if he want to
> > create
> > > >>> > another release candidate for the fix.
> > > >>> >
> > > >>> > I reviewed the code on github and I think it just needs to be
> > covered
> > > >>> with
> > > >>> > a unit test to be complete.
> > > >>> >
> > > >>> > Regards,
> > > >>> > Andor
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> > On Thu, Apr 5, 2018 at 9:05 PM, Alexander Shraer <
> > shra...@gmail.com>
> > > >>> > wrote:
> > > >>> >
> > > >>> > > Yes sort of, FLE is finished, then enough observer's messages
> > reach
> > > >>> the
> > > >>> > > leader before participant's messages do.
> > > >>> > > Whether its rare depends on the number of observers and
> > > >>> participants. For
> > > >>> > > example with very few participants and many observers
> > > >>> > > your chance of hitting this are quite high.
> > > >>> > >
> > > >>> > > Alex
> > > >>> > >
> > > >>> > > On Thu, Apr 5, 2018 at 11:44 AM, Andor Molnar <
> > an...@cloudera.com>
> > > >>> > wrote:
> > > >>> > >
> > > >>> > > > Maybe I'm missing something here, but this looks like a rare
> > edge
> > > >>> case
> > > >>> > to
> > > >>> > > > me. Participants must finish the leader election successfully
> > and
> > > >>> right
> > > >>> > > > after enough followers should fail to send epoch to the
> > leader, so
> > > >>> > > > observers can take it over.
> > > >>> > > >
> > > >>> > > > Is that description accurate?
> > > >>> > > >
> > > >>> > > > Andor
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > On Thu, Apr 5, 2018 at 7:35 PM, Alexander Shraer <
> > > >>> shra...@gmail.com>
> > > >>> > > > wrote:
> > > >>> > > >
> > > >>> > > > > To clarify - in a deployment with observers this bug can
> > > >>> potentially
> > > >>> > > > cause
> > > >>> > > > > data loss. A server could be elected leader based just on the
> > > >>> support
> > > >>> > > of
> > > >>> > > > > observers, even if this servers data is stale wrt other
> > > >>> followers.
> > > >>> > > > >
> > > >>> > > > > It is certainly a blocker, just not sure if for 3.4.11 or
> > 3.4.12.
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > Alex
> > > >>> > > > > On Thu, Apr 5, 2018 at 10:29 AM Andor Molnar <
> > an...@cloudera.com
> > > >>> >
> > > >>> > > wrote:
> > > >>> > > > >
> > > >>> > > > > > I don't think it's a blocker.
> > > >>> > > > > > The jira and PR has been open since last December and
> > 3.4.11
> > > >>> has
> > > >>> > > > released
> > > >>> > > > > > without it.
> > > >>> > > > > >
> > > >>> > > > > > Although this bug is also important to fix, I believe it's
> > more
> > > >>> > > > important
> > > >>> > > > > > to release a fix for the regression we've found in 3.4.11
> > asap.
> > > >>> > > > > >
> > > >>> > > > > > Abe, any thoughts?
> > > >>> > > > > >
> > > >>> > > > > > Regards,
> > > >>> > > > > > Andor
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > > On Thu, Apr 5, 2018 at 7:00 PM, Alexander Shraer <
> > > >>> > shra...@gmail.com>
> > > >>> > > > > > wrote:
> > > >>> > > > > >
> > > >>> > > > > > > Sorry for coming in at the last moment. I'm not sure
> > when the
> > > >>> > next
> > > >>> > > > 3.4
> > > >>> > > > > > > release is scheduled, so just wanted to mention this bug,
> > > >>> > > > > > > which I believe is a blocker for either this or next
> > release:
> > > >>> > > > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> > > >>> > > > > > >
> > > >>> > > > > > > Best,
> > > >>> > > > > > > Alex
> > > >>> > > > > > >
> > > >>> > > > > > > On Thu, Apr 5, 2018 at 9:09 AM, Ted Yu <
> > yuzhih...@gmail.com>
> > > >>> > > wrote:
> > > >>> > > > > > >
> > > >>> > > > > > > > Can the vote be closed ?
> > > >>> > > > > > > >
> > > >>> > > > > > > > It seems we have enough +1's
> > > >>> > > > > > > >
> > > >>> > > > > > > > Thanks
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> >

Reply via email to