Github user rfarivar commented on a diff in the pull request:
https://github.com/apache/storm/pull/945#discussion_r47546157
--- Diff: docs/documentation/distcache-blobstore.md ---
@@ -0,0 +1,732 @@
+# Storm Distributed Cache API
+
+The distributed cache feature in storm is used to efficiently distribute
files
+(or blobs, which is the equivalent terminology for a file in the
distributed
+cache and is used interchangeably in this document) that are large and can
+change during the lifetime of a topology, such as geo-location data,
+dictionaries, etc. Typical use cases include phrase recognition, entity
+extraction, document classification, URL re-writing, location/address
detection
+and so forth. Such files may be several KB to several GB in size. For small
+datasets that don't need dynamic updates, including them in the topology
jar
+could be fine. But for large files, the startup times could become very
large.
+In these cases, the distributed cache feature can provide fast topology
startup,
+especially if the files were previously downloaded for the same submitter
and
+are still in the cache. This is useful with frequent deployments,
sometimes few
+times a day with updated jars, because the large cached files will remain
available
+without changes. The large cached blobs that do not change frequently will
+remain available in the distributed cache.
+
+At the starting time of a topology, the user specifies the set of files the
+topology needs. Once a topology is running, the user at any time can
request for
+any file in the distributed cache to be updated with a newer version. The
+updating of blobs happens in an eventual consistency model. If the topology
+needs to know what version of a file it has access to, it is the
responsibility
+of the user to find this information out. The files are stored in a cache
with
+Least-Recently Used (LRU) eviction policy, where the supervisor decides
which
+cached files are no longer needed and can delete them to free disk space.
The
+blobs can be compressed, and the user can request the blobs to be
uncompressed
+before it accesses them.
+
+## Motivation for Distributed Cache
+* Allows sharing blobs among topologies.
+* Allows updating the blobs from the command line.
+
+## Distributed Cache Implementations
+The current BlobStore interface has the following two implementations
+* LocalFsBlobStore
+* HdfsBlobStore
+
+Appendix A contains the interface for blob store implementation.
+
+## LocalFsBlobStore
+
+
+Local file system implementation of Blobstore can be depicted in the above
timeline diagram.
--- End diff --
All across this document you either use "blob store" or "blobstore". Please
choose one version and change everything else.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---