Aaron Beppu created HBASE-12728:
-----------------------------------

             Summary: buffered writes substantially less useful after removal 
of HTablePool
                 Key: HBASE-12728
                 URL: https://issues.apache.org/jira/browse/HBASE-12728
             Project: HBase
          Issue Type: Bug
          Components: hbase
    Affects Versions: 0.98.0
            Reporter: Aaron Beppu


In previous versions of HBase, when use of HTablePool was encouraged, HTable 
instances were long-lived in that pool, and for that reason, if autoFlush was 
set to false, the table instance could accumulate a full buffer of writes 
before a flush was triggered. Writes from the client to the cluster could then 
be substantially larger and less frequent than without buffering.

However, when HTablePool was deprecated, the primary justification seems to 
have been that creating HTable instances is cheap, so long as the connection 
and executor service being passed to it are pre-provided. A use pattern was 
encouraged where users should create a new HTable instance for every operation, 
using an existing connection and executor service, and then close the table. In 
this pattern, buffered writes are substantially less useful; writes are as 
small and as frequent as they would have been with autoflush=true, except the 
synchronous write is moved from the operation itself to the table close call 
which immediately follows.

More concretely :
```
// Given these two helpers ...

private HTableInterface getAutoFlushTable(String tableName) throws IOException {
  // (autoflush is true by default)
  return storedConnection.getTable(tableName, executorService);
}

private HTableInterface getBufferedTable(String tableName) throws IOException {
  HTableInterface table = getAutoFlushTable(tableName);
  table.setAutoFlush(false);
  return table;
}

// it's my contention that these two methods would behave almost identically,
// except the first will hit a synchronous flush during the put call,
and the second will
// flush during the (hidden) close call on table.

private void writeAutoFlushed(Put somePut) throws IOException {
  try (HTableInterface table = getAutoFlushTable(tableName)) {
    table.put(somePut); // will do synchronous flush
  }
}

private void writeBuffered(Put somePut) throws IOException {
  try (HTableInterface table = getBufferedTable(tableName)) {
    table.put(somePut);
  } // auto-close will trigger synchronous flush
}
```

For buffered writes to actually provide a performance benefit to users, one of 
two things must happen:

- The writeBuffer itself shouldn't live, flush and die with the lifecycle of 
it's HTableInstance. If the writeBuffer were managed elsewhere and had a long 
lifespan, this could cease to be an issue. However, if the same writeBuffer is 
appended to by multiple tables, then some additional concurrency control will 
be needed around it.
- Alternatively, there should be some pattern for having long-lived HTable 
instances. However, since HTable is not thread-safe, we'd need multiple 
instances, and a mechanism for leasing them out safely -- which sure sounds a 
lot like the old HTablePool to me.

See discussion on mailing list here : 
http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to