Thanks for following up Alex.

On Fri, Apr 13, 2018, at 14:48, Alexander Shraer wrote:
> We discussed with Pat offline and agreed to go without this patch,
> especially since we need to patch 3 branches: 3.4, 3.5 and master.> We'll 
> prepare 3.5 and master and then commit all 3 together in time
> for the next release. So Abe, please go ahead with your release.> 
> Alex
> 
> On Fri, Apr 13, 2018 at 2:26 PM, Patrick Hunt
> <ph...@apache.org> wrote:>> Hey folks. I've been on vacation. My 0.02 - given 
> the release
>> candidate is>>  well underway, has sufficient votes/time to finalize, this 
>> is not a>>  regression in 3.4.12 and it's not yet committed I would think we
>>  finalize/push 3.4.12 then quickly followup with a 3.4.13 that
>>  addresses>>  this. Alex could be the RM given his interest/advocacy.
>> 
>>  Regards,
>> 
>>  Patrick
>> 
>> 
>> On Fri, Apr 13, 2018 at 11:55 AM, Abraham Fine
>> <af...@apache.org> wrote:>> 
>>  > Given that the primary driver of this release is to fix an issue
>>  > with the>>  > misuse of dataDir and dataLogDir I would rather see this 
>> release
>>  > make it>>  > out the door with minimal additional changes to core
>>  > functionality so>>  > people can more confidently upgrade.
>>  >
>>  > What do you think Pat?
>>  >
>>  > Abe
>>  >
>>  > On Fri, Apr 13, 2018, at 11:37, Alexander Shraer wrote:
>>  > > Now that we have the fix, why delay it to next release?
>>  > >
>>  > > On Fri, Apr 13, 2018 at 11:09 AM Abraham Fine <af...@apache.org>
>>  > > wrote:>>  > >
>>  > > > Let's wait until the next release to include this fix.
>>  > > >
>>  > > > On Mon, Apr 9, 2018, at 15:14, Alexander Shraer wrote:
>>  > > > > Hi,
>>  > > > >
>>  > > > > Please take a look on the new PR for ZK-2959:
>>  > > > > https://github.com/apache/zookeeper/pull/500
>>  > > > > If there are no further comments, I can commit it.
>>  > > > >
>>  > > > > Thanks,
>>  > > > > Alex
>>  > > > >
>>  > > > > On Fri, Apr 6, 2018 at 11:33 AM, Alexander Shraer
>>  > > > > <shra...@gmail.com>>  > >
>>  > > > wrote:
>>  > > > >
>>  > > > > > Hi,
>>  > > > > >
>>  > > > > > The bug described in  ZOOKEEPER-2959
>>  > > > > > <https://issues.apache.org/jira/browse/ZOOKEEPER-2959>  is
>>  > > > > > that>>  > > > > > getEpochToPropose an waitForEpochAck do not 
>> distinguish
>>  > > > > > between>>  > > > followers
>>  > > > > > and observers.
>>  > > > > > This can cause a candidate leader's acceptedEpoch to be
>>  > > > > > updated>>  > with
>>  > > > only
>>  > > > > > support from observers. Same for waitForEpochAck - passing
>>  > > > > > this>>  > method
>>  > > > > > allows the candidate leader to update the currentEpoch.
>>  > > > > > The latter>>  > > > helps
>>  > > > > > this server to win FLE elections continuously, and the
>>  > > > > > former>>  > > > > > (acceptedEpoch)
>>  > > > > > causes anyone trying to connect to the server to think
>>  > > > > > that it has>>  > more
>>  > > > > > up-to-date data and trucate their logs to match.
>>  > > > > >
>>  > > > > >
>>  > > > > > Alex
>>  > > > > >
>>  > > > > > On Fri, Apr 6, 2018 at 10:04 AM, Fangmin Lv
>>  > > > > > <lvfang...@gmail.com>>>  > > > wrote:
>>  > > > > >
>>  > > > > >> Hi Alex,
>>  > > > > >>
>>  > > > > >> Can you give more details about the data loss scenario in
>>  > > > > >> Jira>>  > > > > >> ZOOKEEPER-2959 <https://issues.apache.org/
>>  > jira/browse/ZOOKEEPER-2959
>>  > > > >?
>>  > > > > >> As far as I know, the leader will ignore the observers'
>>  > > > > >> ACK in>>  > > > > >> waitForNewLeaderAck, so it will not start 
>> serve traffic
>>  > > > > >> until it>>  > > > received
>>  > > > > >> the actual quorum ACK, if it doesn't have enough
>>  > > > > >> followers support>>  > > > before
>>  > > > > >> timeout, it will quit leading and it's learners will re-
>>  > > > > >> sync with>>  > new
>>  > > > > >> leader.
>>  > > > > >>
>>  > > > > >> Thanks,
>>  > > > > >> Fangmin
>>  > > > > >>
>>  > > > > >> On Thu, Apr 5, 2018 at 12:57 PM, Alexander Shraer <
>>  > shra...@gmail.com>
>>  > > > > >> wrote:
>>  > > > > >>
>>  > > > > >>> Btw we actually observed the described issue (data
>>  > > > > >>> loss),>>  > thankfully
>>  > > > in a
>>  > > > > >>> test environment. So I thought this is important to
>>  > > > > >>> share with>>  > the
>>  > > > > >>> community.
>>  > > > > >>>
>>  > > > > >>> Unfortunately I don’t have time to run a new ZK release
>>  > > > > >>> for>>  > this, so
>>  > > > I’m
>>  > > > > >>> not going to -1 your candidate, but we are actively
>>  > > > > >>> working on a>>  > fix
>>  > > > (ie
>>  > > > > >>> a
>>  > > > > >>> test at this point) and I can commit that as soon as we
>>  > > > > >>> have>>  > that.
>>  > > > > >>>
>>  > > > > >>> It may be worth while to delay the release by a few more
>>  > > > > >>> days,>>  > but
>>  > > > it’s
>>  > > > > >>> totally up to you since you’re running it.
>>  > > > > >>>
>>  > > > > >>> Cheers
>>  > > > > >>> Alex
>>  > > > > >>> On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar
>>  > > > > >>> <an...@cloudera.com>>  > >
>>  > > > wrote:
>>  > > > > >>>
>>  > > > > >>> > Got that. I still believe it's a completely valid
>>  > > > > >>> > issue which>>  > has
>>  > > > to be
>>  > > > > >>> > addressed, but it's not a showstopper. I'm afraid
>>  > > > > >>> > we're not>>  > going
>>  > > > to
>>  > > > > >>> > convince each other, so it's probably Abe's call if he
>>  > > > > >>> > want to>>  > > > create
>>  > > > > >>> > another release candidate for the fix.
>>  > > > > >>> >
>>  > > > > >>> > I reviewed the code on github and I think it just
>>  > > > > >>> > needs to be>>  > > > covered
>>  > > > > >>> with
>>  > > > > >>> > a unit test to be complete.
>>  > > > > >>> >
>>  > > > > >>> > Regards,
>>  > > > > >>> > Andor
>>  > > > > >>> >
>>  > > > > >>> >
>>  > > > > >>> >
>>  > > > > >>> > On Thu, Apr 5, 2018 at 9:05 PM, Alexander Shraer <
>>  > > > shra...@gmail.com>
>>  > > > > >>> > wrote:
>>  > > > > >>> >
>>  > > > > >>> > > Yes sort of, FLE is finished, then enough observer's
>>  > > > > >>> > > messages>>  > > > reach
>>  > > > > >>> the
>>  > > > > >>> > > leader before participant's messages do.
>>  > > > > >>> > > Whether its rare depends on the number of observers
>>  > > > > >>> > > and>>  > > > > >>> participants. For
>>  > > > > >>> > > example with very few participants and many
>>  > > > > >>> > > observers>>  > > > > >>> > > your chance of hitting this 
>> are quite high.
>>  > > > > >>> > >
>>  > > > > >>> > > Alex
>>  > > > > >>> > >
>>  > > > > >>> > > On Thu, Apr 5, 2018 at 11:44 AM, Andor Molnar <
>>  > > > an...@cloudera.com>
>>  > > > > >>> > wrote:
>>  > > > > >>> > >
>>  > > > > >>> > > > Maybe I'm missing something here, but this looks
>>  > > > > >>> > > > like a>>  > rare
>>  > > > edge
>>  > > > > >>> case
>>  > > > > >>> > to
>>  > > > > >>> > > > me. Participants must finish the leader election
>>  > successfully
>>  > > > and
>>  > > > > >>> right
>>  > > > > >>> > > > after enough followers should fail to send epoch
>>  > > > > >>> > > > to the>>  > > > leader, so
>>  > > > > >>> > > > observers can take it over.
>>  > > > > >>> > > >
>>  > > > > >>> > > > Is that description accurate?
>>  > > > > >>> > > >
>>  > > > > >>> > > > Andor
>>  > > > > >>> > > >
>>  > > > > >>> > > >
>>  > > > > >>> > > > On Thu, Apr 5, 2018 at 7:35 PM, Alexander Shraer <>>  > > 
>> > > >>> shra...@gmail.com>
>>  > > > > >>> > > > wrote:
>>  > > > > >>> > > >
>>  > > > > >>> > > > > To clarify - in a deployment with observers this
>>  > > > > >>> > > > > bug can>>  > > > > >>> potentially
>>  > > > > >>> > > > cause
>>  > > > > >>> > > > > data loss. A server could be elected leader
>>  > > > > >>> > > > > based just>>  > on the
>>  > > > > >>> support
>>  > > > > >>> > > of
>>  > > > > >>> > > > > observers, even if this servers data is stale
>>  > > > > >>> > > > > wrt other>>  > > > > >>> followers.
>>  > > > > >>> > > > >
>>  > > > > >>> > > > > It is certainly a blocker, just not sure if for
>>  > > > > >>> > > > > 3.4.11 or>>  > > > 3.4.12.
>>  > > > > >>> > > > >
>>  > > > > >>> > > > >
>>  > > > > >>> > > > > Alex
>>  > > > > >>> > > > > On Thu, Apr 5, 2018 at 10:29 AM Andor Molnar <
>>  > > > an...@cloudera.com
>>  > > > > >>> >
>>  > > > > >>> > > wrote:
>>  > > > > >>> > > > >
>>  > > > > >>> > > > > > I don't think it's a blocker.
>>  > > > > >>> > > > > > The jira and PR has been open since last
>>  > > > > >>> > > > > > December and>>  > > > 3.4.11
>>  > > > > >>> has
>>  > > > > >>> > > > released
>>  > > > > >>> > > > > > without it.
>>  > > > > >>> > > > > >
>>  > > > > >>> > > > > > Although this bug is also important to fix, I
>>  > > > > >>> > > > > > believe>>  > it's
>>  > > > more
>>  > > > > >>> > > > important
>>  > > > > >>> > > > > > to release a fix for the regression we've
>>  > > > > >>> > > > > > found in>>  > 3.4.11
>>  > > > asap.
>>  > > > > >>> > > > > >
>>  > > > > >>> > > > > > Abe, any thoughts?
>>  > > > > >>> > > > > >
>>  > > > > >>> > > > > > Regards,
>>  > > > > >>> > > > > > Andor
>>  > > > > >>> > > > > >
>>  > > > > >>> > > > > >
>>  > > > > >>> > > > > >
>>  > > > > >>> > > > > > On Thu, Apr 5, 2018 at 7:00 PM, Alexander
>>  > > > > >>> > > > > > Shraer <>>  > > > > >>> > shra...@gmail.com>
>>  > > > > >>> > > > > > wrote:
>>  > > > > >>> > > > > >
>>  > > > > >>> > > > > > > Sorry for coming in at the last moment. I'm
>>  > > > > >>> > > > > > > not sure>>  > > > when the
>>  > > > > >>> > next
>>  > > > > >>> > > > 3.4
>>  > > > > >>> > > > > > > release is scheduled, so just wanted to
>>  > > > > >>> > > > > > > mention this>>  > bug,
>>  > > > > >>> > > > > > > which I believe is a blocker for either this
>>  > > > > >>> > > > > > > or next>>  > > > release:
>>  > > > > >>> > > > > > > 
>> https://issues.apache.org/jira/browse/ZOOKEEPER-2959>>  > > > > >>> > > > > 
>> > >
>>  > > > > >>> > > > > > > Best,
>>  > > > > >>> > > > > > > Alex
>>  > > > > >>> > > > > > >
>>  > > > > >>> > > > > > > On Thu, Apr 5, 2018 at 9:09 AM, Ted Yu <
>>  > > > yuzhih...@gmail.com>
>>  > > > > >>> > > wrote:
>>  > > > > >>> > > > > > >
>>  > > > > >>> > > > > > > > Can the vote be closed ?
>>  > > > > >>> > > > > > > >
>>  > > > > >>> > > > > > > > It seems we have enough +1's
>>  > > > > >>> > > > > > > >
>>  > > > > >>> > > > > > > > Thanks
>>  > > > > >>> > > > > > > >
>>  > > > > >>> > > > > > >
>>  > > > > >>> > > > > >
>>  > > > > >>> > > > >
>>  > > > > >>> > > >
>>  > > > > >>> > >
>>  > > > > >>> >
>>  > > > > >>>
>>  > > > > >>
>>  > > > > >>
>>  > > > > >
>>  > > >
>>  >

Reply via email to