Github user keith-turner commented on a diff in the pull request:

    
https://github.com/apache/incubator-fluo-recipes/pull/130#discussion_r114650888
  
    --- Diff: docs/combine-queue.md ---
    @@ -0,0 +1,224 @@
    +<!--
    +Licensed to the Apache Software Foundation (ASF) under one or more
    +contributor license agreements.  See the NOTICE file distributed with
    +this work for additional information regarding copyright ownership.
    +The ASF licenses this file to You under the Apache License, Version 2.0
    +(the "License"); you may not use this file except in compliance with
    +the License.  You may obtain a copy of the License at
    +
    +    http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing, software
    +distributed under the License is distributed on an "AS IS" BASIS,
    +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +See the License for the specific language governing permissions and
    +limitations under the License.
    +-->
    +# Combine Queue Recipe
    +
    +## Background
    +
    +When many transactions try to modify the same keys, collisions will occur. 
 Too many collisions
    +cause transactions to fail and throughput to nose dive.  For example, 
consider [phrasecount]
    +which has many transactions processing documents.  Each transaction counts 
the phrases in a document
    +and then updates global phrase counts.  Since transaction attempts to 
update many phrases
    +, the probability of collisions is high.
    +
    +## Solution
    +
    +The [combine queue recipe][CombineQueue] provides a reusable solution for 
updating many keys while
    +avoiding collisions.  The recipe also organizes updates into batches in 
order to improve throughput.
    +
    +This recipes queues updates to keys for other transactions to process. In 
the phrase count example
    +transactions processing documents queue updates, but do not actually 
update the counts.  Below is an
    +example of computing phrasecounts using this recipe.
    +
    + * TX1 queues `+1` update  for phrase `we want lambdas now`
    + * TX2 queues `+1` update  for phrase `we want lambdas now`
    + * TX3 reads the updates and current value for the phrase `we want lambdas 
now`.  There is no current value and the updates sum to 2, so a new value of 2 
is written.
    + * TX4 queues `+2` update  for phrase `we want lambdas now`
    + * TX5 queues `-1` update  for phrase `we want lambdas now`
    + * TX6 reads the updates and current value for the phrase `we want lambdas 
now`.  The current value is 2 and the updates sum to 1, so a new value of 3 is 
written.
    +
    +Transactions processing updates have the ability to make additional 
updates.
    +For example in addition to updating the current value for a phrase, the new
    +value could also be placed on an export queue to update an external 
database.
    +
    +### Buckets
    +
    +A simple implementation of this recipe would have an update queue for each 
key.  However the
    +implementation is slightly more complex.  Each update queue is in a bucket 
and transactions process
    +all of the updates in a bucket.  This allows more efficient processing of 
updates for the following
    +reasons :
    +
    + * When updates are queued, notifications are made per bucket(instead of 
per a key).
    + * The transaction doing the update can scan the entire bucket reading 
updates, this avoids a seek for each key being updated.
    + * Also the transaction can request a batch lookup to get the current 
value of all the keys being updated.
    + * Any additional actions taken on update (like adding something to an 
export queue) can also be batched.
    + * Data is organized to make reading exiting values for keys in a bucket 
more efficient.
    +
    +Which bucket a key goes to is decided using hash and modulus so that 
multiple updates for a key go
    +to the same bucket.
    +
    +The initial number of tablets to create when applying table optimizations 
can be controlled by
    +setting the buckets per tablet option when configuring a Combine Queue.  
For example if you
    +have 20 tablet servers and 1000 buckets and want 2 tablets per tserver 
initially then set buckets
    +per tablet to 1000/(2*20)=25.
    +
    +## Example Use
    +
    +The following code snippets show how to use this recipe for wordcount.  
The first step is to
    +configure it before initializing Fluo.  When initializing an ID is needed. 
 This ID is used in two
    +ways.  First, the ID is used as a row prefix in the table.  Therefore 
nothing else should use that
    +row range in the table.  Second, the ID is used in generating 
configuration keys.
    +
    +The following snippet shows how to configure a combine queue.
    +
    +```java
    +    FluoConfiguration fluoConfig = ...;
    +
    +    // Set application properties for the combine queue.  These properties 
are read later by
    +    // the observers running on each worker.
    +    CombineQueue.configure(WcObserverProvider.ID)
    +        
.keyType(String.class).valueType(Long.class).buckets(119).finish(fluoConfig);
    --- End diff --
    
    I like `store()` also thinking of `set()`
    
    ```java
    CombineQueue.configure(WcObserverProvider.ID)
            
.keyType(String.class).valueType(Long.class).buckets(119).store(fluoConfig);
    ```
    
    ```java
    CombineQueue.configure(WcObserverProvider.ID)
            
.keyType(String.class).valueType(Long.class).buckets(119).set(fluoConfig);
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to