Re: Fw: Re: Local BlobStore and Nimbus H/A

Jungtaek Lim Wed, 25 Jan 2017 01:51:45 -0800

Follow up: for checking topology blobs (and corresponding dependencies) I
filed STORM-2323 <https://issues.apache.org/jira/browse/STORM-2323> and
submitted a pull request.


- Jungtaek Lim (HeartSaVioR)

2017년 1월 25일 (수) 오전 8:46, Jungtaek Lim <[email protected]>님이 작성:

> (adding missed dev@)
>
> Thanks Sanket to answer questions. Following up questions inline.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 1월 25일 (수) 오전 3:13, Sanket Chintapalli <[email protected]>님이
> 작성:
>
>
> Questions related to blobstore
>
> 1.  In get-version-for-key, it always create KeySequenceNumber instance so
> synchronized keyword on KeySequenceNumber.getKeySequenceNumber() is no
> effect. But looks like the method shouldn't be run concurrently on same
> key, and furthermore every Nimbuses can call the method concurrently. Then
> we might need to have a kind of global lock on obtaining version of the
> key. Is my understanding right?
>
> Firstly the KeyVersionSequenceNumber is initialized for every topology.
> Yes you are correct there is no need for synchronized. I thought it was
> called from someother place in an other thread. I don't think it is required
>
>
> But get-version-for-key is called from setup-blobstore and
> NimbusClient.createStateInZookeeper, which makes KeySequenceNumber called
> from multiple Nimbuses concurrently.
> 'synchronized' keyword is no effect so we can remove that, but what I'm
> really wondering is that the logic still work safely without global lock if
> multiple Nimbuses try to write their state to children of the blobstore
> key, since what the method reads to determine the version can be changed
> concurrently.
>
>
>
> 2.  We rely on blob-sync function which is skipped on leader Nimbus, then
> how leader Nimbus synchronizes its blobs?
>
> Here the leader is assumed to have all the blobs of active topologies.
> blob-sync is only for non-leader nimbuses as any change in the state on
> zookeeper will trigger them to sync the blobs. We do not want leader to
> sync on itself, it would make no sense whenever state is updated.
>
>
> Our precondition of leader is only checking topology blobs, and it doesn't
> check the version of the blobs since it just compares via keys.
> (We now have additional jar files as dependencies of topology but not
> reflected yet - need to address before releasing 1.1.0.)
>
> So leader Nimbus only guarantees that it can provide topology blobs
> anyway, but still doesn't guarantee for other normal blobs, and moreover
> there is no guarantee that leader Nimbus has latest blobs on topology blobs.
>
>
> https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L87-L99
>
> In scenario 2 for example, N1 and N2 has different blob content for the
> blob because N1 is crashed before N2 is updating its blob. Blob for N2 was
> v2 in point of content view, and promoted to v4 without any content update,
> and v3 is available later when N1 is restored but discarded.
>
> This will be more complicated when multiple Nimbuses come in. Supposing N3
> has v3 for that blob, but N2 can still get a leadership and in result v3
> will be discarded (since leader Nimbus doesn't sync from others).
>
> I guess the logic might be safe if we force replication factor to same to
> the count of Nimbuses, but then any follower Nimbus will be SPOF, and not
> sure it still be safe when new nimbus comes in.
>
> Please correct me if I'm missing here.
>
>
>
> 3. What's the purpose of setup-blobstore function? It seems slight
> different from blob-sync function (not updating the blobs) and it runs with
> both leader and follow Nimbus.
> Setup-blobstore was placed as an initialization function. As far as I
> remember there are ephemeral nodes on zookeeper and they disappear after
> nimbus goes down, it does some initialization operations once they come
> back up to sync all topology blobs not present on the local blobstore.
>
>
> Yes right. BlobStore assumes that leader Nimbus shouldn't sync its blobs
> from follower Nimbus so in this case setup-blobstore is needed. I just
> thought about the case leader Nimbus need to synchronize its blobs.
>
>
>
> 4. In KeySequenceNumber.getKeySequenceNumber(), in some situations version
> number goes 0 (initial version - 1). Is it meant to mark blob as invalid?
> Then can we use 0 for version number if any exceptions are caught on that
> method? Since shallowing the whole exceptions in that method seems intended
> but we met crash on that case, and even we throw the exception and pass to
> caller, rolling back is not easy.
>
> The idea was to say that your version is 0 then you have crashed or come
> up and you might want to sync with latest blobs on leader nimbus. As setup
> blobstore already does the diff of local blobs and active blobs on
> zookeeper and tries to delete blobs not on zookeeper and create locally
> active keys when it crashes and comes up, I think we can mark and return
> the initial sequence number as 1 instead of 0. The 0 will trigger the sync
> again for already download/updated blob. I presume returning the (initial
> version) instead of (initial version - 1) might fix the issue.
>
>
> Let's back to the origin issue.
>
> There were three Nimbuses N1, N2, N3 and N1 was the leader. We submitted
> topology T1 and after submission we restarted N1. N2 got leader and we
> killed T1. While N1 is initializing and syncing up its topology blobs, N2
> concurrently removes the ZK path and also max sequence number path for
> topology blob in progress of killing topology.
>
> The problem was that N1 calls KeySequenceNumber.getKeySequenceNumber()
> while registering local blob, and checkExist on /blobstore/key was
> succeed but following getChildren /blobstore/key fails on NoNodeException.
> (/blobstoremaxsequencenumber/key might be exist or not, since N2 is
> removing the paths)
> In this case which is correct return value for that blob?
>
>
> Thanks,
> Sanket Chintapalli
>
> On Tuesday, January 24, 2017, 10:35:50 AM CST, Sanket Chintapalli <
> [email protected]> wrote:
> Edit * but not 2
>
> On Tuesday, January 24, 2017, 10:35:22 AM CST, Sanket Chintapalli <
> [email protected]> wrote:
> Yes Bobby I have to look at will reply, I think I can explain 1, 3 and 4
> but not 3. I did not understand 2.
>
> Thanks,
> Sanket
>
> On Tuesday, January 24, 2017, 9:42:35 AM CST, Bobby Evans <
> [email protected]> wrote:
> If you have time to answer some of his question that would really be great.
>
>
> ----- Forwarded Message -----
> *From:* Jungtaek Lim <[email protected]>
> *To:* Bobby Evans <[email protected]>; Dev <[email protected]>
> *Sent:* Tuesday, January 24, 2017, 9:35:36 AM CST
> *Subject:* Re: Local BlobStore and Nimbus H/A
>
> I just skimmed the blobstore code, and some questions about the
> implementation. I'd be really appreciated if someone understanding
> BlobStore answers below questions.
> (Questions are based on current 1.x branch, but I think master branch is
> similar.)
>
> 1. In get-version-for-key, it always create KeySequenceNumber instance so
> synchronized keyword on KeySequenceNumber.getKeySequenceNumber() is no
> effect. But looks like the method shouldn't be run concurrently on same
> key, and furthermore every Nimbuses can call the method concurrently. Then
> we might need to have a kind of global lock on obtaining version of the
> key. Is my understanding right?
>
> 2. We rely on blob-sync function which is skipped on leader Nimbus, then
> how leader Nimbus synchronizes its blobs?
>
> 3. What's the purpose of setup-blobstore function? It seems slight
> different from blob-sync function (not updating the blobs) and it runs with
> both leader and follow Nimbus.
>
> 4. In KeySequenceNumber.getKeySequenceNumber(), in some situations version
> number goes 0 (initial version - 1). Is it meant to mark blob as invalid?
> Then can we use 0 for version number if any exceptions are caught on that
> method? Since shallowing the whole exceptions in that method seems intended
> but we met crash on that case, and even we throw the exception and pass to
> caller, rolling back is not easy.
>
> Thanks in advance,
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 1월 24일 (화) 오후 12:25, Jungtaek Lim <[email protected]>님이 작성:
>
> > More details: there were three Nimbuses N1, N2, N3 and N1 was the leader.
> > We submitted topology T1 and after submission we restarted N1. N2 got
> > leader and we killed T1. While N1 is initializing and syncing up its
> > topology blobs, N2 concurrently removes the ZK path and also max sequence
> > number path for topology blob in progress of killing topology. This race
> > condition is only occurring on Local BlobStore since removing ZK path is
> > done only if Nimbus is using Local BlobStore.
> >
> > So it's the former case, and stopping current sync phase and restarting
> > sync is an ideal way since we're just guaranteeing eventually consistent.
> > I'll take a look at the codebase to see how we can apply, but it should
> be
> > great help for me if someone is familiar with BlobStore codebase and
> > willing to handle it.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > 2017년 1월 23일 (월) 오후 11:33, Bobby Evans <[email protected]>님이 작성:
> >
> > HA for the blobstore was set up so that ZK would hold the source of truth
> > and then the other nimbus nodes would be eventually consistent with each
> > other.  I'm not totally sure of the issue, because I don't understand if
> > this is happening in the context of a follower trying to keep up to date,
> > or with a leader entering the data.  If it is the latter we need some
> > better fencing to prevent multiple leaders trying to write to the DB at
> the
> > same time.  If it is the former we need some better code so the follower
> > can read/update the replica it has without the possibility of trying to
> > auto-vivify a node that is being deleted.  Generally in these cases we
> > would declare the race safe and then just start the sync process over
> again.
> >
> >
> > - Bobby
> >
> >
> > On Monday, January 23, 2017, 2:12:18 AM CST, Jungtaek Lim <
> > [email protected]> wrote:
> > Hi devs,
> >
> > I've been struggling to resolve specific scenario, and found Local
> > BlobStore cares about Nimbus failure scenarios, but not about removing
> > keys.
> >
> > For example, I heard that Nimbus crashed in specific scenario, and error
> > stack trace pointed to below code:
> >
> >
> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L138-L149
> >
> > checkExists (L138
> > <
> >
> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L138
> > >)
> > succeeds but getChildren (L149
> > <
> >
> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L149
> > >)
> > throws NoNodeException, in result sequenceNumbers.last() throws
> > NoSuchElementException.
> >
> > We could have a look deeply and make some workarounds, but given that ZK
> is
> > accessible from every Nimbuses, we can't ensure every paths are safe.
> >
> > I guess that BlobStore needs global lock or single controller to handle
> all
> > the things right. I'm also open to any workarounds or other ideas.
> >
> > What do you think?
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> >
>
>

Re: Fw: Re: Local BlobStore and Nimbus H/A

Reply via email to