[
https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279373#comment-14279373
]
Nick Dimiduk commented on HBASE-12728:
--------------------------------------
I think it's better to cover the multi-threaded coordination on behalf of the
user than expect them to do the synchronizing themselves. The train rolling
here is a good one -- it's nice cleanup, it's consistent with previous
behaviors, and it makes things more obvious for users. Accompany this with
thoughtful javadoc review and a fat example that we can dump into the online
book and this will be a fine resolution.
I still like better having a {{Table}},{{BufferedTable}} instead of
{{Table}},{{BufferedMutator}}. I think having a drop-in buffering option will
make the most sense for a usable API. I hear the argument of maybe it's not the
place of our client out-of-the-box, but we have a solution to this today that
some folks depend on, so I think it's irresponsible to omit it for 1.0. If
[~sduskis] is truly fed up with us ( ::smile:: ) I'm happy to pick up the patch
in this direction.
I also think splitting the {{Table}} concept into a reader and a writer is
something worth exploring, but not for 1.0. I'm hoping by 2.0 we'll have a
valid story for an async (or [reactive|http://www.reactivemanifesto.org]?)
client and maybe even something that operates on top of a C/native
implementation so we can close the gap for folks who aren't on the JVM. For
now, let's get 1.0 release unblocked.
> buffered writes substantially less useful after removal of HTablePool
> ---------------------------------------------------------------------
>
> Key: HBASE-12728
> URL: https://issues.apache.org/jira/browse/HBASE-12728
> Project: HBase
> Issue Type: Bug
> Components: hbase
> Affects Versions: 0.98.0
> Reporter: Aaron Beppu
> Priority: Blocker
> Fix For: 1.0.0, 2.0.0, 1.1.0
>
> Attachments: 12728.connection-owns-buffers.example.branch-1.0.patch,
> HBASE-12728-2.patch, HBASE-12728.patch, bulk-mutator.patch
>
>
> In previous versions of HBase, when use of HTablePool was encouraged, HTable
> instances were long-lived in that pool, and for that reason, if autoFlush was
> set to false, the table instance could accumulate a full buffer of writes
> before a flush was triggered. Writes from the client to the cluster could
> then be substantially larger and less frequent than without buffering.
> However, when HTablePool was deprecated, the primary justification seems to
> have been that creating HTable instances is cheap, so long as the connection
> and executor service being passed to it are pre-provided. A use pattern was
> encouraged where users should create a new HTable instance for every
> operation, using an existing connection and executor service, and then close
> the table. In this pattern, buffered writes are substantially less useful;
> writes are as small and as frequent as they would have been with
> autoflush=true, except the synchronous write is moved from the operation
> itself to the table close call which immediately follows.
> More concretely :
> ```
> // Given these two helpers ...
> private HTableInterface getAutoFlushTable(String tableName) throws
> IOException {
> // (autoflush is true by default)
> return storedConnection.getTable(tableName, executorService);
> }
> private HTableInterface getBufferedTable(String tableName) throws IOException
> {
> HTableInterface table = getAutoFlushTable(tableName);
> table.setAutoFlush(false);
> return table;
> }
> // it's my contention that these two methods would behave almost identically,
> // except the first will hit a synchronous flush during the put call,
> and the second will
> // flush during the (hidden) close call on table.
> private void writeAutoFlushed(Put somePut) throws IOException {
> try (HTableInterface table = getAutoFlushTable(tableName)) {
> table.put(somePut); // will do synchronous flush
> }
> }
> private void writeBuffered(Put somePut) throws IOException {
> try (HTableInterface table = getBufferedTable(tableName)) {
> table.put(somePut);
> } // auto-close will trigger synchronous flush
> }
> ```
> For buffered writes to actually provide a performance benefit to users, one
> of two things must happen:
> - The writeBuffer itself shouldn't live, flush and die with the lifecycle of
> it's HTableInstance. If the writeBuffer were managed elsewhere and had a long
> lifespan, this could cease to be an issue. However, if the same writeBuffer
> is appended to by multiple tables, then some additional concurrency control
> will be needed around it.
> - Alternatively, there should be some pattern for having long-lived HTable
> instances. However, since HTable is not thread-safe, we'd need multiple
> instances, and a mechanism for leasing them out safely -- which sure sounds a
> lot like the old HTablePool to me.
> See discussion on mailing list here :
> http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)