[jira] [Commented] (STORM-1372) Update BlobStore Documentation - Follow up STORM-876

ASF GitHub Bot (JIRA) Mon, 14 Dec 2015 07:59:39 -0800

    [ 
https://issues.apache.org/jira/browse/STORM-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056170#comment-15056170
 ]


ASF GitHub Bot commented on STORM-1372:
---------------------------------------

Github user unsleepy22 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/945#discussion_r47514765
  
    --- Diff: docs/documentation/distcache-blobstore.md ---
    @@ -0,0 +1,733 @@
    +# Storm Distributed Cache API
    +
    +The distributed cache feature in storm is used to efficiently distribute 
files
    +(or blobs, which is the equivalent terminology for a file in the 
distributed
    +cache and is used interchangeably in this document) that are large and can
    +change during the lifetime of a topology, such as geo-location data,
    +dictionaries, etc. Typical use cases include phrase recognition, entity
    +extraction, document classification, URL re-writing, location/address 
detection
    +and so forth. Such files may be several KB to several GB in size. For small
    +datasets that don't need dynamic updates, including them in the topology 
jar
    +could be fine. But for large files, the startup times could become very 
large.
    +In these cases, the distributed cache feature can provide fast topology 
startup,
    +especially if the files were previously downloaded for the same submitter 
and
    +are still in the cache. This is useful with frequent deployments, 
sometimes few
    +times a day with updated jars, because the large cached files will remain 
available
    +without changes. The large cached blobs that do not change frequently will
    +remain available in the distributed cache.
    +
    +At the starting time of a topology, the user specifies the set of files the
    +topology needs. Once a topology is running, the user at any time can 
request for
    +any file in the distributed cache to be updated with a newer version. The
    +updating of blobs happens in an eventual consistency model. If the topology
    +needs to know what version of a file it has access to, it is the 
responsibility
    +of the user to find this information out. The files are stored in a cache 
with
    +Least-Recently Used (LRU) eviction policy, where the supervisor decides 
which
    +cached files are no longer needed and can delete them to free disk space. 
The
    +blobs can be compressed, and the user can request the blobs to be 
uncompressed
    +before it accesses them.
    +
    +## Motivation for Distributed Cache
    +* Allows sharing blobs among topologies.
    +* Allows updating the blobs from the command line.
    +
    +## Distributed Cache Implementations
    +The current BlobStore interface has the following two implementations
    +* LocalFsBlobStore
    +* HdfsBlobStore
    +
    +Appendix A contains the interface for blob store implementation.
    +
    +## LocalFsBlobStore
    +![LocalFsBlobStore](images/local_blobstore.png)
    +
    +Local file system implementation of Blobstore can be depicted in the above 
timeline diagram.
    +
    +There are several stages from blob creation to blob download and 
corresponding execution of a topology. 
    +The main stages can be depicted as follows
    +
    +### Blob Creation Command
    +Blobs in the blobstore can be created through command line using the 
following command.
    +storm blobstore create --file README.txt --acl o::rwa --repl-fctr 4 key1
    +The above command creates a blob with a key name “key1” corresponding to 
the file README.txt. 
    +The access given to all users being read, write and admin with a 
replication factor of 4.
    +
    +### Topology Submission and Blob Mapping
    +Users can submit their topology with the following command. The command 
includes the 
    +topology map configuration. The configuration holds two keys “key1” and 
“key2” with the 
    +key “key1” having a local file name mapping named “blob_file” and it is 
not compressed.
    +
    +```
    +storm jar 
/home/y/lib/storm-starter/current/storm-starter-jar-with-dependencies.jar 
    +storm.starter.clj.word_count test_topo -c 
topology.blobstore.map='{"key1":{"localname":"blob_file", 
"uncompress":"false"},"key2":{}}'
    +```
    +
    +### Blob Creation Process
    +The creation of the blob takes place through the interface 
“ClientBlobStore”. Appendix B contains the “ClientBlobStore” interface. 
    +The concrete implementation of this interface is the  “NimbusBlobStore”. 
In the case of local file system the client makes a 
    +call to the nimbus to create the blobs within the local file system. The 
nimbus uses the local file system implementation to create these blobs. 
    +When a user submits a topology, the jar, configuration and code files are 
uploaded as blobs with the help of blob store. 
    +Also, all the other blobs specified by the topology are mapped to it with 
the help of topology.blobstore.map configuration.
    +
    +### Blob Download by the Supervisor
    +Finally, the blobs corresponding to a topology are downloaded by the 
supervisor once it receives the assignments from the nimbus through 
    +the same “NimbusBlobStore” thrift client that uploaded the blobs. The 
supervisor downloads the code, jar and conf blobs by calling the 
    +“NimbusBlobStore” client directly while the blobs specified in the 
topology.blobstore.map are downloaded and mapped locally with the help 
    +of the Localizer. The Localizer talks to the “NimbusBlobStore” thrift 
client to download the blobs and adds the blob compression and local 
    +blob name mapping logic to suit the implementation of a topology. Once all 
the blobs have been downloaded the workers are launched to run 
    +the topologies.
    +
    +## HdfsBlobStore
    +![HdfsBlobStore](images/hdfs_blobstore.png)
    +
    +The HdfsBlobStore functionality has a similar implementation and blob 
creation and download procedure barring how the replication 
    +is handled in the two blob store implementations. The replication in HDFS 
blob store is obvious as HDFS is equipped to handle replication 
    +and it requires no state to be stored inside the zookeeper. On the other 
hand, the local file system blobstore requires the state to be 
    +stored on the zookeeper in order for it to work with nimbus HA. Nimbus HA 
allows the local filesystem to implement the replication feature 
    +seamlessly by storing the state in the zookeeper about the running 
topologies and syncing the blobs on various nimbodes. On the supervisor’s 
    +end, the supervisor and localizer talks to HdfsBlobStore through 
“HdfsClientBlobStore” implementation.
    +
    +## Additional Features and Documentation
    +```
    +storm jar 
/home/y/lib/storm-starter/current/storm-starter-jar-with-dependencies.jar 
storm.starter.clj.word_count test_topo 
    +-c topology.blobstore.map='{"key1":{"localname":"blob_file", 
"uncompress":"false"},"key2":{}}'
    +```
    + 
    +### Compression
    +The blob store allows the user to specify the “uncompress” configuration 
to true or false. This configuration can be specified 
    +in the topology.blobstore.map mentioned in the above command. This allows 
the user to upload a compressed file like a tarball/zip. 
    +In local file system blob store, the compressed blobs are stored on the 
nimbus node. The localizer code takes the responsibility to 
    +uncompress the blob and store it on the supervisor node. Symbolic links to 
the blobs on the supervisor node are created within the worker 
    +before the execution starts.
    +
    +### Local File Name Mapping
    +Apart from compression the blobstore helps to give the blob a name that 
can be used by the workers. The localizer takes 
    +the responsibility of mapping the blob to a local name on the supervisor 
node.
    +
    +## Additional Blob Store Implementation Details
    +Blob store uses a hashing function to create the blobs based on the key. 
The blobs are generally stored inside the directory specified by 
    +the blobstore.dir configuration. By default, it is stored under 
“storm.local.dir/nimbus/blobs” for local file system and a similar path on 
    +hdfs file system.
    +
    +Once a file is submitted, the blob store reads the configs and creates a 
metadata for the blob with all the access control details. The metadata 
    +is generally used for authorization while accessing the blobs. The blob 
key and version contribute to the hash code and there by the directory 
    +under “storm.local.dir/nimbus/blobs/data” where the data is placed. The 
blobs are generally placed in a positive number directory like 193,822 etc.
    +
    +Once the topology is launched and the relevant blobs have been created the 
supervisor downloads blobs related to the storm.conf, storm.ser 
    +and storm.code first and all the blobs uploaded by the command line 
separately using the localizer to uncompress and map them to a local name 
    +specified in the topology.blobstore.map configuration. The supervisor 
periodically updates blobs by checking for the change of version. 
    +This allows updating the blobs on the fly and thereby making it a very 
useful feature.
    +
    +For a local file system, the distributed cache on the supervisor node is 
set to 10240 MB as a soft limit and the clean up code attempts 
    +to clean anything over the soft limit every 600 seconds based on LRU 
policy.
    +
    +The HDFS blob store implementation handles load better by removing the 
burden on the nimbus to store the blobs, which avoids it becoming a bottleneck. 
Moreover, it provides seamless replication of blobs. On the other hand, the 
local file system blob store is not very efficient in 
    +replicating the blobs and is limited by the number of nimbuses. Moreover, 
the supervisor talks to the HDFS blob store directly without the 
    +involvement of the nimbus and thereby reduces the load and dependency on 
nimbus.
    +
    +## Highly Available Nimbus
    +### Problem Statement:
    +Currently the storm master aka nimbus, is a process that runs on a single 
machine under supervision. In most cases the 
    +nimbus failure is transient and it is restarted by the supervisor. However 
sometimes when disks fail and networks 
    +partitions occur, nimbus goes down. Under these circumstances the 
topologies run normally but no new topologies can be 
    +submitted, no existing topologies can be killed/deactivated/activated and 
if a supervisor node fails then the 
    +reassignments are not performed resulting in performance degradation or 
topology failures. With this project we intend 
    +to resolve this problem by running nimbus in a primary backup mode to 
guarantee that even if a nimbus server fails one 
    +of the backups will take over. 
    +
    +### Requirements for Highly Available Nimbus:
    +* Increase overall availability of nimbus.
    +* Allow nimbus hosts to leave and join the cluster at will any time. A 
newly joined host should auto catch up and join 
    +the list of potential leaders automatically. 
    +* No topology resubmissions required in case of nimbus fail overs.
    +* No active topology should ever be lost. 
    +
    +#### Leader Election:
    +The nimbus server will use the following interface:
    +
    +```java
    +public interface ILeaderElector {
    +    /**
    +     * queue up for leadership lock. The call returns immediately and the 
caller                     
    +     * must check isLeader() to perform any leadership action.
    +     */
    +    void addToLeaderLockQueue();
    +
    +    /**
    +     * Removes the caller from the leader lock queue. If the caller is 
leader
    +     * also releases the lock.
    +     */
    +    void removeFromLeaderLockQueue();
    +
    +    /**
    +     *
    +     * @return true if the caller currently has the leader lock.
    +     */
    +    boolean isLeader();
    +
    +    /**
    +     *
    +     * @return the current leader's address , throws exception if noone 
has has    lock.
    +     */
    +    InetSocketAddress getLeaderAddress();
    +
    +    /**
    +     * 
    +     * @return list of current nimbus addresses, includes leader.
    +     */
    +    List<InetSocketAddress> getAllNimbusAddresses();
    +}
    +```
    +Once a nimbus comes up it calls addToLeaderLockQueue() function. The 
leader election code selects a leader from the queue.
    +If the topology code, jar or config blobs are missing, it would download 
the blobs from any other  
    +
    +The first implementation will be Zookeeper based. If the zookeeper 
connection is lost/resetted resulting in loss of lock
    +or the spot in queue the implementation will take care of updating the 
state such that isLeader() will reflect the 
    +current status.The leader like actions must finish in less than 
minimumOf(connectionTimeout, SessionTimeout) to ensure
    +the lock was held by nimbus for the entire duration of the action (Not 
sure if we want to just state this expectation 
    +and ensure that zk configurations are set high enough which will result in 
higher failover time or we actually want to 
    +create some sort of rollback mechanism for all actions, the second option 
needs a lot of code). If a nimbus that is not 
    +leader receives a request that only a leader can perform it will throw a 
RunTimeException.
    +
    +### Nimbus state store:
    +
    +To achieve fail over from primary to backup servers nimbus state/data 
needs to be replicated across all nimbus hosts or 
    +needs to be stored in a distributed storage. Replicating the data 
correctly involves state management, consistency checks
    +and it is hard to test for correctness. However many storm users do not 
want to take extra dependency on another replicated
    +storage system like HDFS and still need high availability. The blob store 
implementation along with the state storage helps
    +to overcome the failover scenarios is case a leader nimbus goes down.
    +
    +To support replication we will allow the user to define a code replication 
factor which would reflect number of nimbus 
    +hosts to which the code must be replicated before starting the topology. 
With replication comes the issue of consistency. 
    +The topology is launched once the code, jar and conf blob files are 
replicated based on the "topology.min.replication" config.
    +Maintaining state for failover scenarios is important for local file 
system. The current implementation makes sure one of the
    +available nimbus is elected as a leader in the case of a failure. If the 
topology specific blobs are missing, the leader nimbus
    +tries to download them as and when they are needed. With this current 
architecture, we do not have to download all the blobs 
    +required for a topology for a nimbus to accept leadership. This helps us 
in case the blobs are very large and avoid causing any 
    +inadvertant delays in electing a leader.
    +
    +The state for every blob is relevant for the local blob store 
implementation. For HDFS blob store the replication
    +is taken care by the HDFS. For handling the fail over scenarios for a 
local blob store we need to store the state of the leader and
    +non leader nimbodes within the zookeeper.
    +
    +The state is stored under 
/storm/blobstore/key/nimbusHostPort:SequenceNumber for the blob store to work 
to make nimbus highly available. 
    +This state is used in the local file system blobstore to support 
replication. The HDFS blobstore does not have to store the state inside the 
    +zookeeper.
    +
    +* NimbusHostPort: This piece of information generally contains the parsed 
string holding the hostname and port of the nimbus. 
    +  It uses the same class “NimbusHostPortInfo” used earlier by the 
code-distributor interface to store the state and parse the data.
    +
    +* SequenceNumber: This is the blob sequence number information. The 
SequenceNumber information is implemented by a KeySequenceNumber class. 
    +The sequence numbers are generated for every key. For every update, the 
sequence numbers are assigned based ona global sequence number 
    +stored under /storm/blobstoremaxsequencenumber/key. For more details about 
how the numbers are generated you can look at the java docs for 
    +KeySequenceNumber.
    +
    +![Nimbus High Availability - BlobStore](images/nimbus_ha_blobstore.png)
    +
    +The sequence diagram proposes how the blob store works and the state 
storage inside the zookeeper makes the nimbus highly available.
    +Currently, the thread to sync the blobs on a non-leader is within the 
nimbus. In the future, it will be nice to move the thread around
    +to the blob store to make the blobstore coordinate the state change and 
blob download as per the sequence diagram.
    +
    +## Thrift and Rest API 
    +In order to avoid workers/supervisors/ui talking to zookeeper for getting 
master nimbus address we are going to modify the 
    +`getClusterInfo` API so it can also return nimbus information. 
getClusterInfo currently returns `ClusterSummary` instance
    +which has a list of `supervisorSummary` and a list of 'topologySummary` 
instances. We will add a list of `NimbusSummary` 
    +to the `ClusterSummary`. See the structures below:
    +
    +```thrift
    +struct ClusterSummary {
    +  1: required list<SupervisorSummary> supervisors;
    +  3: required list<TopologySummary> topologies;
    +  4: required list<NimbusSummary> nimbuses;
    +}
    +
    +struct NimbusSummary {
    +  1: required string host;
    +  2: required i32 port;
    +  3: required i32 uptime_secs;
    +  4: required bool isLeader;
    +  5: required string version;
    +}
    +```
    +
    +This will be used by StormSubmitter, Nimbus clients, supervisors and ui to 
discover the current leaders and participating 
    +nimbus hosts. Any nimbus host will be able to respond to these requests. 
The nimbus hosts can read this information once 
    +from zookeeper and cache it and keep updating the cache when the watchers 
are fired to indicate any changes,which should 
    +be rare in general case.
    +
    +Note: All nimbus hosts have watchers on zookeeper to be notified 
immediately as soon as a new blobs is available for download, the callback may 
or may not download
    +the code. Therefore, a background thread is triggered to download the 
respective blobs to run the topologies. The replication is achieved when the 
blobs are downloaded
    +onto non-leader nimbodes. So you should expect your topology submission 
time to be somewhere between 0 to (2 * nimbus.code.sync.freq.secs) for any 
    +nimbus.min.replication.count > 1.
    +
    +## Configuration
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +blobstore.dir: The directory where all blobs are stored. For local file 
system it represents the directory on the nimbus 
    +node and for HDFS file system it represents the hdfs file system path.
    +
    +supervisor.blobstore.class: This configuration is meant to set the client 
for  the supervisor  in order to talk to the blob store. 
    +For a local file system blob store it is set to 
“backtype.storm.blobstore.NimbusBlobStore” and for the HDFS blob store it is 
set 
    +to “backtype.storm.blobstore.HdfsClientBlobStore”.
    +
    +supervisor.blobstore.download.thread.count: This configuration spawns 
multiple threads for from the supervisor in order download 
    +blobs concurrently. The default is set to 5
    +
    +supervisor.blobstore.download.max_retries: This configuration is set to 
allow the supervisor to retry for the blob download. 
    +By default it is set to 3.
    +
    +supervisor.localizer.cache.target.size.mb: The jvm opts provided to 
workers launched by this supervisor. All "%ID%" substrings 
    +are replaced with an identifier for this worker. Also, "%WORKER-ID%", 
"%STORM-ID%" and "%WORKER-PORT%" are replaced with 
    +appropriate runtime values for this worker. The distributed cache target 
size in MB. This is a soft limit to the size 
    +of the distributed cache contents. It is set to 10240 MB.
    +
    +supervisor.localizer.cleanup.interval.ms: The distributed cache cleanup 
interval. Controls how often it scans to attempt to 
    +cleanup anything over the cache target size. By default it is set to 
600000 milliseconds.
    +
    +nimbus.blobstore.class:  Sets the blobstore implementation nimbus uses. It 
is set to "backtype.storm.blobstore.LocalFsBlobStore"
    +
    +nimbus.blobstore.expiration.secs: During operations with the blob store, 
via master, how long a connection is idle before nimbus 
    +considers it dead and drops the session and any associated connections. 
The default is set to 600.
    +
    +storm.blobstore.inputstream.buffer.size.bytes: The buffer size it uses for 
blob store upload. It is set to 65536 bytes.
    +
    +client.blobstore.class: The blob store implementation the storm client 
uses. The current implementation uses the default 
    +config "backtype.storm.blobstore.NimbusBlobStore".
    +
    +blobstore.replication.factor: It sets the replication for each blob within 
the blob store. The “topology.min.replication.count” 
    +ensures the minimum replication the topology specific blobs are set before 
launching the topology. You might want to set the 
    +“topology.min.replication.count <= blobstore.replication”. The default is 
set to 3.
    +
    +topology.min.replication.count : Minimum number of nimbus hosts where the 
code must be replicated before leader nimbus
    +can mark the topology as active and create assignments. Default is 1.
    +
    +topology.max.replication.wait.time.sec: Maximum wait time for the nimbus 
host replication to achieve the nimbus.min.replication.count.
    +Once this time is elapsed nimbus will go ahead and perform topology 
activation tasks even if required nimbus.min.replication.count is not achieved. 
    +The default is 60 seconds, a value of -1 indicates to wait for ever.
    +* nimbus.code.sync.freq.secs: Frequency at which the background thread on 
nimbus which syncs code for locally missing blobs. Default is 2 minutes.
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +
    +## Using the Distributed Cache API, Command Line Interface (CLI)
    +
    +### Creating blobs 
    +
    +To use the distributed cache feature, the user first has to "introduce" 
files
    +that need to be cached and bind them to key strings. To achieve this, the 
user
    +uses the "blobstore create" command of the storm executable, as follows:
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm blobstore create [-f|--file FILE] [-a|--acl ACL1,ACL2,...] 
[--repl-fctr NUMBER] [keyname]
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +The contents come from a FILE, if provided by -f or --file option, 
otherwise
    +from STDIN.  
    +The ACLs, which can also be a comma separated list of many ACLs, is of the
    +following format:
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +> [u|o]:[username]:[r-|w-|a-|_]
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +where:  
    +
    +* u = user  
    +* o = other  
    +* username = user for this particular ACL  
    +* r = read access  
    +* w = write access  
    +* a = admin access  
    +* _ = ignored  
    +
    +The replication factor can be set to a value greater than 1 using 
--repl-fctr.
    +
    +Note: The replication right now is configurable for a hdfs blobstore but 
for a
    +local blobstore the replication always stays at 1. For a hdfs blobstore
    +the default replication is set to 3.
    +
    +###### Example:  
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm blobstore create --file README.txt --acl o::rwa --repl-fctr 4 key1
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +In the above example, the *README.txt* file is added to the distributed 
cache.
    +It can be accessed using the key string "*key1*" for any topology that 
needs
    +it. The file is set to have read/write/admin access for others, a.k.a world
    +everything and the replication is set to 4.
    +
    +###### Example:  
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm blobstore create mytopo:data.tgz -f data.tgz -a 
u:alice:rwa,u:bob:rw,o::r  
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +The above example createss a mytopo:data.tgz key using the data stored in
    +data.tgz.  User alice would have full access, bob would have read/write 
access
    +and everyone else would have read access.
    +
    +### Making dist. cache files accessible to topologies
    +
    +Once a blob is created, we can use it for topologies. This is generally 
achieved
    +by including the key string among the configurations of a topology, with 
the
    +following format. A shortcut is to add the configuration item on the 
command
    +line when starting a topology by using the **-c** command:
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +-c topology.blobstore.map='{"[KEY]":{"localname":"[VALUE]", 
"uncompress":"[true|false]"}}'
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +Note: Please take care of the quotes.
    +
    +The cache file would then be accessible to the topology as a local file 
with the
    +name [VALUE].  
    +The localname parameter is optional, if omitted the local cached file will 
have
    +the same name as [KEY].  
    +The uncompress parameter is optional, if omitted the local cached file 
will not
    +be uncompressed.  Note that the key string needs to have the appropriate
    +file-name-like format and extension, so it can be uncompressed correctly.
    +
    +###### Example:  
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm jar 
/home/y/lib/storm-starter/current/storm-starter-jar-with-dependencies.jar 
storm.starter.clj.word_count test_topo -c 
topology.blobstore.map='{"key1":{"localname":"blob_file", 
"uncompress":"false"},"key2":{}}'
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +Note: Please take care of the quotes.
    +
    +In the above example, we start the *word_count* topology (stored in the
    +*storm-starter-jar-with-dependencies.jar* file), and ask it to have access
    +to the cached file stored with key string = *key1*. This file would then be
    +accessible to the topology as a local file called *blob_file*, and the
    +supervisor will not try to uncompress the file. Note that in our example, 
the
    +file's content originally came from *README.txt*. We also ask for the file
    +stored with the key string = *key2* to be accessible to the topology. Since
    +both the optional parameters are omitted, this file will get the local 
name =
    +*key2*, and will not be uncompressed.
    +
    +### Updating a cached file
    +
    +It is possible for the cached files to be updated while topologies are 
running.
    +The update happens in an eventual consistency model, where the supervisors 
poll
    +Nimbus every 30 seconds, and update their local copies. In the current 
version,
    +it is the user's responsibility to check whether a new file is available.
    +
    +To update a cached file, use the following command. Contents come from a 
FILE or
    +STDIN. Write access is required to be able to update a cached file.
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm blobstore update [-f|--file NEW_FILE] [KEYSTRING]
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +###### Example:  
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm blobstore update -f updates.txt key1
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +In the above example, the topologies will be presented with the contents 
of the
    +file *updates.txt* instead of *README.txt* (from the previous example), 
even
    +though their access by the topology is still through a file called
    +*blob_file*.
    +
    +### Removing a cached file
    +
    +To remove a file from the distributed cache, use the following command. 
Removing
    +a file requires write access.
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm blobstore delete [KEYSTRING]
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +### Listing Blobs currently in the distributed cache blob store
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm blobstore list [KEY...]
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +lists blobs currently in the blob store
    +
    +### Reading the contents of a blob
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm blobstore cat [-f|--file FILE] KEY
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +read a blob and then either write it to a file, or STDOUT. Reading a blob
    +requires read access.
    +
    +### Setting the access control for a blob
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +set-acl [-s ACL] KEY
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +ACL is in the form [uo]:[username]:[r-][w-][a-] can be comma  separated 
list
    +(requires admin access).
    +
    +### Update the replication factor for a blob
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm blobstore replication --update --repl-fctr 5 key1
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +### Read the replication factor of a blob
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm blobstore replication --read key1
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +### Command line help
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +storm help blobstore
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +
    +## Using the Distributed Cache API from Java
    +
    +We start by getting a ClientBlobStore object by calling this function:
    +
    
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    +Config theconf = new Config();
    --- End diff --
    
    could you use ```  ``` to quote java code? same for all below.


> Update BlobStore Documentation - Follow up STORM-876
> ----------------------------------------------------
>
>                 Key: STORM-1372
>                 URL: https://issues.apache.org/jira/browse/STORM-1372
>             Project: Apache Storm
>          Issue Type: Story
>            Reporter: Sanket Reddy
>            Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1372) Update BlobStore Documentation - Follow up STORM-876

Reply via email to