[
https://issues.apache.org/jira/browse/STORM-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055152#comment-15055152
]
ASF GitHub Bot commented on STORM-1372:
---------------------------------------
Github user zhuoliu commented on a diff in the pull request:
https://github.com/apache/storm/pull/945#discussion_r47450187
--- Diff: docs/documentation/distcache-blobstore.md ---
@@ -0,0 +1,734 @@
+# Storm Distributed Cache API
+
+The distributed cache feature in storm is used to efficiently distribute
files
+(or blobs, which is the equivalent terminology for a file in the
distributed
+cache and is used interchangeably in this document) that are large and can
+change during the lifetime of a topology, such as geo-location data,
+dictionaries, etc. Typical use cases include phrase recognition, entity
+extraction, document classification, URL re-writing, location/address
detection
+and so forth. Such files may be several KB to several GB in size. For small
+datasets that don't need dynamic updates, including them in the topology
jar
+could be fine. But for large files, the startup times could become very
large.
+In these cases, the distributed cache feature can provide fast topology
startup,
+especially if the files were previously downloaded for the same submitter
and
+are still in the cache. This is useful with frequent deployments, sometime
a few
+a day with updated jars, because the large cached files will remain
available
--- End diff --
"sometime a few a day"?
> Update BlobStore Documentation - Follow up STORM-876
> ----------------------------------------------------
>
> Key: STORM-1372
> URL: https://issues.apache.org/jira/browse/STORM-1372
> Project: Apache Storm
> Issue Type: Story
> Reporter: Sanket Reddy
> Priority: Minor
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)