[
https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261316#comment-14261316
]
Carter commented on HBASE-12728:
--------------------------------
[~sduskis] and I have been discussing this offline for a couple of days and
have come up with the following proposal. It will require a few JIRA
sub-tasks, but it's probably only a week's worth of work, plus reviews. In a
nutshell:
# Deprecate autoFlush methods (ie Put buffering) in {{HTable}}
# Remove all autoFlush methods from {{Table}}
# Create {{BufferedTable}} (outlined below), which will buffer Puts using
{{HTableMultiplexer}}
# Create {{BufferedConnection}} as a new factory class
# Have {{HTableMultiplexer}} implement {{Closeable}} (just fixing bad behavior)
# _Behavior change_: {{HTableMultiplexer}} flushes Puts by having
{{FlushWorker}} threads...
#* _OLD_: implement low-level logic against {{AsyncProcess}}
#* _NEW_: call {{Table#put<List<Put>>}}, thus removing duplicate code and
improving encapsulation
# _Behavior change_: When its queue is full, {{HTableMultiplexer}}...
#* _OLD_: immediately rejects all Puts
#* _NEW_: blocks for a configurable time in ms (can be 0) before rejecting Puts
# _Behavior change_: When an async Put fails...
#* _OLD_: the exception is thrown during a unrelated future Put operation
(confusing)
#* _NEW_: an exception is sent to the a listener provided by the client,
following the Observer pattern
These are the new classes:
{code:java}
public class BufferedConnection implements Connection {
private Connection c;
private HTableMultiplexer htm;
/* If listener is null, will log but won't notify an async exceptions */
public BufferedConnection(Connection c, ExceptionListener l) {
this.c = c;
this.htm = new HTableMultiplexer(..., c, l);
}
public BufferedTable getTable(TableName tn) {
return new BufferedTable(c.getTable(tn), htm);
}
/* getAdmin() and getRegionLocator(...) methods delegate to Connection */
}
public class BufferedTable implements Table {
private Table t;
private HTableMultiplexer htm;
public BufferedTable(Table t, HTableMultiplexer htm) { ... }
/* Puts go to htm.doPut(...), all methods delegate to t */
}
public interface ExceptionListener {
public void onException(RetriesExhaustedWithDetailsException e);
}
{code}
>From a user standpoint it looks like this:
*Before*
{code:java}
Connection conn = ConnectionFactory.createConnection();
Table t = conn.getTable(TableName.valueOf("mytable"));
t.setAutoFlushTo(false);
/* do stuff */
t.close();
conn.close();
{code}
*After*
{code:java}
Connection conn = new BufferedConnection(ConnectionFactory.createConnection());
Table t = conn.getTable(TableName.valueOf("mytable"));
/* do stuff */
t.close();
conn.close();
{code}
In essence, a few new classes, a moderate amount of work in HTableMultiplexer,
and a few deprecation annotations in HTable. Let us know if this looks
acceptable and we'll create some subtasks and make it so.
> buffered writes substantially less useful after removal of HTablePool
> ---------------------------------------------------------------------
>
> Key: HBASE-12728
> URL: https://issues.apache.org/jira/browse/HBASE-12728
> Project: HBase
> Issue Type: Bug
> Components: hbase
> Affects Versions: 0.98.0
> Reporter: Aaron Beppu
>
> In previous versions of HBase, when use of HTablePool was encouraged, HTable
> instances were long-lived in that pool, and for that reason, if autoFlush was
> set to false, the table instance could accumulate a full buffer of writes
> before a flush was triggered. Writes from the client to the cluster could
> then be substantially larger and less frequent than without buffering.
> However, when HTablePool was deprecated, the primary justification seems to
> have been that creating HTable instances is cheap, so long as the connection
> and executor service being passed to it are pre-provided. A use pattern was
> encouraged where users should create a new HTable instance for every
> operation, using an existing connection and executor service, and then close
> the table. In this pattern, buffered writes are substantially less useful;
> writes are as small and as frequent as they would have been with
> autoflush=true, except the synchronous write is moved from the operation
> itself to the table close call which immediately follows.
> More concretely :
> ```
> // Given these two helpers ...
> private HTableInterface getAutoFlushTable(String tableName) throws
> IOException {
> // (autoflush is true by default)
> return storedConnection.getTable(tableName, executorService);
> }
> private HTableInterface getBufferedTable(String tableName) throws IOException
> {
> HTableInterface table = getAutoFlushTable(tableName);
> table.setAutoFlush(false);
> return table;
> }
> // it's my contention that these two methods would behave almost identically,
> // except the first will hit a synchronous flush during the put call,
> and the second will
> // flush during the (hidden) close call on table.
> private void writeAutoFlushed(Put somePut) throws IOException {
> try (HTableInterface table = getAutoFlushTable(tableName)) {
> table.put(somePut); // will do synchronous flush
> }
> }
> private void writeBuffered(Put somePut) throws IOException {
> try (HTableInterface table = getBufferedTable(tableName)) {
> table.put(somePut);
> } // auto-close will trigger synchronous flush
> }
> ```
> For buffered writes to actually provide a performance benefit to users, one
> of two things must happen:
> - The writeBuffer itself shouldn't live, flush and die with the lifecycle of
> it's HTableInstance. If the writeBuffer were managed elsewhere and had a long
> lifespan, this could cease to be an issue. However, if the same writeBuffer
> is appended to by multiple tables, then some additional concurrency control
> will be needed around it.
> - Alternatively, there should be some pattern for having long-lived HTable
> instances. However, since HTable is not thread-safe, we'd need multiple
> instances, and a mechanism for leasing them out safely -- which sure sounds a
> lot like the old HTablePool to me.
> See discussion on mailing list here :
> http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)