Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/1574
@HeartSaVioR the local file system blobstore behaves exactly the same as
the nimbus HA before it. It tries to replicate all of the blobs to all of the
nimbus nodes as quickly as possible. It stores the metadata for each blob in
ZK, just like the original nimbus HA code. The big difference is that it can
store a general set of blobs and they are all versioned so you can upload a new
one, and if you ask for a blob that is not currently local on that nimbus it
will speed up they sync process and go grab it from another nimbus instance for
you.
There is still a race, like with nimbus HA where it may not have fully
replicated, but if you set the replication count when the blob is uploaded it
should be waiting for the replication to complete before declaring success.
The one bug I saw while going through the code is that when we list keys,
we are doing it only from the local storage, not form ZK. If we were doing it
from ZK then when someone asks for all of the keys they would be guaranteed to
get all of the keys, but this patch would do no good. Simply because it would
not have an API to know what is local and what is not local. In the short term
I think this is OK, but long term we need to discuss how we really want all of
this to work in the different failure cases.
When configured to use HDFS as the backing, all of the "blobs" are stored
in HDFS. We do a directory listing to get the key listing. With that nimbus
truly becomes stateless, and you can stand up a nimbus on a different node and
not have to worry about it. This code would simply do the directory listing
and then become a noop.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---