Follow up: for checking topology blobs (and corresponding dependencies) I filed STORM-2323 <https://issues.apache.org/jira/browse/STORM-2323> and submitted a pull request.
- Jungtaek Lim (HeartSaVioR) 2017년 1월 25일 (수) 오전 8:46, Jungtaek Lim <[email protected]>님이 작성: > (adding missed dev@) > > Thanks Sanket to answer questions. Following up questions inline. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > 2017년 1월 25일 (수) 오전 3:13, Sanket Chintapalli <[email protected]>님이 > 작성: > > > Questions related to blobstore > > 1. In get-version-for-key, it always create KeySequenceNumber instance so > synchronized keyword on KeySequenceNumber.getKeySequenceNumber() is no > effect. But looks like the method shouldn't be run concurrently on same > key, and furthermore every Nimbuses can call the method concurrently. Then > we might need to have a kind of global lock on obtaining version of the > key. Is my understanding right? > > Firstly the KeyVersionSequenceNumber is initialized for every topology. > Yes you are correct there is no need for synchronized. I thought it was > called from someother place in an other thread. I don't think it is required > > > But get-version-for-key is called from setup-blobstore and > NimbusClient.createStateInZookeeper, which makes KeySequenceNumber called > from multiple Nimbuses concurrently. > 'synchronized' keyword is no effect so we can remove that, but what I'm > really wondering is that the logic still work safely without global lock if > multiple Nimbuses try to write their state to children of the blobstore > key, since what the method reads to determine the version can be changed > concurrently. > > > > 2. We rely on blob-sync function which is skipped on leader Nimbus, then > how leader Nimbus synchronizes its blobs? > > Here the leader is assumed to have all the blobs of active topologies. > blob-sync is only for non-leader nimbuses as any change in the state on > zookeeper will trigger them to sync the blobs. We do not want leader to > sync on itself, it would make no sense whenever state is updated. > > > Our precondition of leader is only checking topology blobs, and it doesn't > check the version of the blobs since it just compares via keys. > (We now have additional jar files as dependencies of topology but not > reflected yet - need to address before releasing 1.1.0.) > > So leader Nimbus only guarantees that it can provide topology blobs > anyway, but still doesn't guarantee for other normal blobs, and moreover > there is no guarantee that leader Nimbus has latest blobs on topology blobs. > > > https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L87-L99 > > In scenario 2 for example, N1 and N2 has different blob content for the > blob because N1 is crashed before N2 is updating its blob. Blob for N2 was > v2 in point of content view, and promoted to v4 without any content update, > and v3 is available later when N1 is restored but discarded. > > This will be more complicated when multiple Nimbuses come in. Supposing N3 > has v3 for that blob, but N2 can still get a leadership and in result v3 > will be discarded (since leader Nimbus doesn't sync from others). > > I guess the logic might be safe if we force replication factor to same to > the count of Nimbuses, but then any follower Nimbus will be SPOF, and not > sure it still be safe when new nimbus comes in. > > Please correct me if I'm missing here. > > > > 3. What's the purpose of setup-blobstore function? It seems slight > different from blob-sync function (not updating the blobs) and it runs with > both leader and follow Nimbus. > Setup-blobstore was placed as an initialization function. As far as I > remember there are ephemeral nodes on zookeeper and they disappear after > nimbus goes down, it does some initialization operations once they come > back up to sync all topology blobs not present on the local blobstore. > > > Yes right. BlobStore assumes that leader Nimbus shouldn't sync its blobs > from follower Nimbus so in this case setup-blobstore is needed. I just > thought about the case leader Nimbus need to synchronize its blobs. > > > > 4. In KeySequenceNumber.getKeySequenceNumber(), in some situations version > number goes 0 (initial version - 1). Is it meant to mark blob as invalid? > Then can we use 0 for version number if any exceptions are caught on that > method? Since shallowing the whole exceptions in that method seems intended > but we met crash on that case, and even we throw the exception and pass to > caller, rolling back is not easy. > > The idea was to say that your version is 0 then you have crashed or come > up and you might want to sync with latest blobs on leader nimbus. As setup > blobstore already does the diff of local blobs and active blobs on > zookeeper and tries to delete blobs not on zookeeper and create locally > active keys when it crashes and comes up, I think we can mark and return > the initial sequence number as 1 instead of 0. The 0 will trigger the sync > again for already download/updated blob. I presume returning the (initial > version) instead of (initial version - 1) might fix the issue. > > > Let's back to the origin issue. > > There were three Nimbuses N1, N2, N3 and N1 was the leader. We submitted > topology T1 and after submission we restarted N1. N2 got leader and we > killed T1. While N1 is initializing and syncing up its topology blobs, N2 > concurrently removes the ZK path and also max sequence number path for > topology blob in progress of killing topology. > > The problem was that N1 calls KeySequenceNumber.getKeySequenceNumber() > while registering local blob, and checkExist on /blobstore/key was > succeed but following getChildren /blobstore/key fails on NoNodeException. > (/blobstoremaxsequencenumber/key might be exist or not, since N2 is > removing the paths) > In this case which is correct return value for that blob? > > > Thanks, > Sanket Chintapalli > > On Tuesday, January 24, 2017, 10:35:50 AM CST, Sanket Chintapalli < > [email protected]> wrote: > Edit * but not 2 > > On Tuesday, January 24, 2017, 10:35:22 AM CST, Sanket Chintapalli < > [email protected]> wrote: > Yes Bobby I have to look at will reply, I think I can explain 1, 3 and 4 > but not 3. I did not understand 2. > > Thanks, > Sanket > > On Tuesday, January 24, 2017, 9:42:35 AM CST, Bobby Evans < > [email protected]> wrote: > If you have time to answer some of his question that would really be great. > > > ----- Forwarded Message ----- > *From:* Jungtaek Lim <[email protected]> > *To:* Bobby Evans <[email protected]>; Dev <[email protected]> > *Sent:* Tuesday, January 24, 2017, 9:35:36 AM CST > *Subject:* Re: Local BlobStore and Nimbus H/A > > I just skimmed the blobstore code, and some questions about the > implementation. I'd be really appreciated if someone understanding > BlobStore answers below questions. > (Questions are based on current 1.x branch, but I think master branch is > similar.) > > 1. In get-version-for-key, it always create KeySequenceNumber instance so > synchronized keyword on KeySequenceNumber.getKeySequenceNumber() is no > effect. But looks like the method shouldn't be run concurrently on same > key, and furthermore every Nimbuses can call the method concurrently. Then > we might need to have a kind of global lock on obtaining version of the > key. Is my understanding right? > > 2. We rely on blob-sync function which is skipped on leader Nimbus, then > how leader Nimbus synchronizes its blobs? > > 3. What's the purpose of setup-blobstore function? It seems slight > different from blob-sync function (not updating the blobs) and it runs with > both leader and follow Nimbus. > > 4. In KeySequenceNumber.getKeySequenceNumber(), in some situations version > number goes 0 (initial version - 1). Is it meant to mark blob as invalid? > Then can we use 0 for version number if any exceptions are caught on that > method? Since shallowing the whole exceptions in that method seems intended > but we met crash on that case, and even we throw the exception and pass to > caller, rolling back is not easy. > > Thanks in advance, > Jungtaek Lim (HeartSaVioR) > > 2017년 1월 24일 (화) 오후 12:25, Jungtaek Lim <[email protected]>님이 작성: > > > More details: there were three Nimbuses N1, N2, N3 and N1 was the leader. > > We submitted topology T1 and after submission we restarted N1. N2 got > > leader and we killed T1. While N1 is initializing and syncing up its > > topology blobs, N2 concurrently removes the ZK path and also max sequence > > number path for topology blob in progress of killing topology. This race > > condition is only occurring on Local BlobStore since removing ZK path is > > done only if Nimbus is using Local BlobStore. > > > > So it's the former case, and stopping current sync phase and restarting > > sync is an ideal way since we're just guaranteeing eventually consistent. > > I'll take a look at the codebase to see how we can apply, but it should > be > > great help for me if someone is familiar with BlobStore codebase and > > willing to handle it. > > > > Thanks, > > Jungtaek Lim (HeartSaVioR) > > > > 2017년 1월 23일 (월) 오후 11:33, Bobby Evans <[email protected]>님이 작성: > > > > HA for the blobstore was set up so that ZK would hold the source of truth > > and then the other nimbus nodes would be eventually consistent with each > > other. I'm not totally sure of the issue, because I don't understand if > > this is happening in the context of a follower trying to keep up to date, > > or with a leader entering the data. If it is the latter we need some > > better fencing to prevent multiple leaders trying to write to the DB at > the > > same time. If it is the former we need some better code so the follower > > can read/update the replica it has without the possibility of trying to > > auto-vivify a node that is being deleted. Generally in these cases we > > would declare the race safe and then just start the sync process over > again. > > > > > > - Bobby > > > > > > On Monday, January 23, 2017, 2:12:18 AM CST, Jungtaek Lim < > > [email protected]> wrote: > > Hi devs, > > > > I've been struggling to resolve specific scenario, and found Local > > BlobStore cares about Nimbus failure scenarios, but not about removing > > keys. > > > > For example, I heard that Nimbus crashed in specific scenario, and error > > stack trace pointed to below code: > > > > > https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L138-L149 > > > > checkExists (L138 > > < > > > https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L138 > > >) > > succeeds but getChildren (L149 > > < > > > https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L149 > > >) > > throws NoNodeException, in result sequenceNumbers.last() throws > > NoSuchElementException. > > > > We could have a look deeply and make some workarounds, but given that ZK > is > > accessible from every Nimbuses, we can't ensure every paths are safe. > > > > I guess that BlobStore needs global lock or single controller to handle > all > > the things right. I'm also open to any workarounds or other ideas. > > > > What do you think? > > > > Thanks, > > Jungtaek Lim (HeartSaVioR) > > > > > >
