The design looks reasonable but the callback is not an appropriate
term for async results returned from the thrift broker, as they will
not be called back by anyone. The standard term for the result an
async computation is called "future". (as implemented in the
java.util.concurrent package.) I'd suggest that you take a look at
java Future API design. IMO, Future cancellation needs be implemented
for long scans.

__Luke

On Tue, Jan 11, 2011 at 11:15 AM, Sanjit Jhala <[email protected]> wrote:
> Currently the Hypertable client exposes only a synchronous API. This means
> that if an application wants to read/write from multiple tables it has to
> issue the calls sequentially and block till each call completes. This is
> also true for application managed secondary index tables. Being able to
> issue asynchronous scans and updates should greatly reduce overall
> application latency for these cases.
> I'd like to propose the following design for such an asynchronous API which
> will cover both the C++ client as well as the Thrift interface.
> 1. C++ Client Library
> The C++ client library will provide an abstract callback interface.
> Applications will implement their own callbacks to deal with the results
> from async reads/writes.
> For scans (reads) the callback will get called by the Scanner whenever it
> receives a new ScanBlock from the RangeServers. For updates (writes) the
> callback will get called whenever a update operation completes.
> Also, auto flushing (when per-RangeServer mutator buffers fill up) will be
> disabled for asynchronous mutators.
> The interface will look like:
>
> class ResultCallbackInterface {
>   public:
> virtual scan_error(TableScannerPtr &scanner, int32 error, const String
> &error_msg)=0;
> virtual scan_ok(TableScannerPtr &scanner, vector<Cells> &cells)=0;
> virtual update_ok(TableMutatorPtr &mutator, FailedMutations)=0;
> virtual update_error(TableMutatorPtr &mutator, int32 error, String
> error_msg)=0;
> };
> 2. Thrift interface
> The ThriftBroker will implement a Callback class which will use a queue to
> transform asynchronous API calls into synchronous ThriftBroker calls. There
> will also be a new Result object which will encapsulate the operation type
> (scan/update), results/acknowledgment and errors (if any).
> class ThriftResultCallback {
>   public:
>    // synchronous method which returns results as they arrive and false if
> all results have arrived
>   bool get_result(Result &);
>         // convenience method which blocks till all updates complete
> bool wait_for_updates_to_complete(FailedUpdates &);
>
>         // These methods enqueue results as they arrive which are later
> served to the application via get_results() calls
> scan_error(TableScannerPtr &scanner, int32 error, const String &error_msg);
> scan_ok(TableScannerPtr &scanner, vector<Cells> &cells);
> update_ok(TableMutatorPtr &mutator, FailedMutations);
> update_error(TableMutatorPtr &mutator, int32 error, String error_msg);
>   private:
> ResultQueue m_results;
> };
> Pseudocode for a sample Thrift application:
> rc = create_result_callback();
> // create some asynchronous scanners and mutators
> m1 = create_async_mutator(…, rc);
> m2 = create_async_mutator(…, rc);
> // kick off scans
> s1 = create_async_scanner(…, rc);
> …
> …
> // buffer updates locally
> m1.set_cells(…);
> …
> mn.set_cells(…);
> ...
> // issue updates
> m1.flush();
> m2.flush();
> …
> mn.flush()
> // deal with write acks and scan results as they appear
> while (get_results(rc, rr)) {
> switch(rr.type) {
> case (SCAN):
> …
> case (UPDATE):
> ...
> }
> }
> // issue a set of writes
> m1.set_cells(…);
> m2.set_cells(…);
> m1.flush();
> m2.flush();
> // wait for all writes to complete
> has_error = wait_for_updates_to_complete(rc);
> Implementation notes:
> The ThriftResultCallback object uses m_results to enqueue results for
> consumption by the application. Each synchronous call to get_result() will
> pop a result off the queue or return false if there are no outstanding
> scans/updates. For scans, it will also buffer results by scanner so that the
> application doesn't have to make too many Thrift calls for scans which
> result in a small set of results from a large set of ScanBlocks.
> For the case where a slow application is reading a massive amount of data,
> the callback will have to have some way to pause the queue and scanners to
> avoid being overwhelmed while the application catches up.
> Any thoughts?
> -Sanjit
>
> --
> You received this message because you are subscribed to the Google Groups
> "Hypertable Development" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/hypertable-dev?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.

Reply via email to