[jira] [Commented] (HBASE-12728) buffered writes substantially less useful after removal of HTablePool

Nick Dimiduk (JIRA) Thu, 08 Jan 2015 14:38:19 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270149#comment-14270149
 ]


Nick Dimiduk commented on HBASE-12728:
--------------------------------------

Thanks for having a look [~stack]

bq. In the multithreaded case, which thread calls the BufferedTable#flush? If 
all do, no buffering is going on. Is flush then called after a 'big' put? How's 
that going to work when many threads? Better if flush is done internally at 
size/time/shutdown thresholds. It doesn't seem like a function that belongs in 
the BufferedTable instance.

I thought about this a bit, but maybe [~abeppu] has other ideas. My thinking is 
that after a large batch of puts, before a thread exists, it wants to call 
BufferedTable#flush() to ensure all of it's stuff is done. In this case, it's a 
flush of all threads' pending edits. Similar in concept to writes to our own 
WAL, where we have thread "flush" and wait for their seqNum to come up before 
proceeding. I think this is no incompatible with flushes happening on their 
own, based on size/time. Shutdown is a separate matter (flush everything and 
exit).

bq. Ditto ExceptionListener in BufferedTable. Its awkward, right? If I register 
a listener on BufferedTable, internally I'll need to pass this interest to the 
exception handler and then remove interest when the BufferedTable goes away.

This is true. My patch doesn't do proper accounting for ExceptionListener OR 
the AsyncMutator. These need address in a "real" implementation. I think you'd 
want any EL's registered by a BatchTable instance to expire when that table is 
closed.

bq. If you buy the above two points, then need for a special BufferedTable 
Interface goes away....but then how to listen on exceptions and how to flush?

*nod*

bq. We need a BufferedConnection? Can't we just add the few methods to 
Connection? It is new in 1.0 so we'd be breaking no one (though there is the 
issue of being able to specify an executor, at least optionally, which I see 
you doing in BufferedConnection constructor in the factory).

Nope, we could just as easily add these methods to Connection, but that means 
putting their implementation down in 
ConnectionManager#HConnectionImplementation, or some other implementation that 
wraps CM#HCI in the same way that I've done. I have a separate interface 
because that's what was discussed earlier in the thread. The patch was to 
provoke API design discussion, so mission accomplished ::smile::

bq. The flush on the BufferedConnection is a bit odd, yeah. Flushes all tables? 
Or there'd be an override that allowed passing which table to flush?

Right, it would need to be flush all tables. See the comment I left in 
BufferedConnectionImpl#close():

{noformat}
  @Override
  public void close() throws IOException {
    assert aps.isEmpty() : "Leaking resources";
    /* TODO: instead of assert, something like
    for (TableName t : aps.keySet()) {
      try (BufferedTable table = getBufferedTable(t)) {
        table.flush();
      }
    }
     */
    delegate.close();
  }
{noformat}

Originally I implemented BC#flush() in this way, but decided to pull it and see 
if anyone brought it up in discussion. Given that we have this new object that 
is explicitly buffering, I think we owe it to the interface to provide a means 
for explicitly flushing the buffers. I can't think off hand of know why a user 
would want it, but I think it should be there.

> buffered writes substantially less useful after removal of HTablePool
> ---------------------------------------------------------------------
>
>                 Key: HBASE-12728
>                 URL: https://issues.apache.org/jira/browse/HBASE-12728
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 0.98.0
>            Reporter: Aaron Beppu
>            Assignee: Solomon Duskis
>            Priority: Blocker
>             Fix For: 1.0.0, 2.0.0, 1.1.0
>
>         Attachments: 12728.connection-owns-buffers.example.branch-1.0.patch, 
> bulk-mutator.patch
>
>
> In previous versions of HBase, when use of HTablePool was encouraged, HTable 
> instances were long-lived in that pool, and for that reason, if autoFlush was 
> set to false, the table instance could accumulate a full buffer of writes 
> before a flush was triggered. Writes from the client to the cluster could 
> then be substantially larger and less frequent than without buffering.
> However, when HTablePool was deprecated, the primary justification seems to 
> have been that creating HTable instances is cheap, so long as the connection 
> and executor service being passed to it are pre-provided. A use pattern was 
> encouraged where users should create a new HTable instance for every 
> operation, using an existing connection and executor service, and then close 
> the table. In this pattern, buffered writes are substantially less useful; 
> writes are as small and as frequent as they would have been with 
> autoflush=true, except the synchronous write is moved from the operation 
> itself to the table close call which immediately follows.
> More concretely :
> ```
> // Given these two helpers ...
> private HTableInterface getAutoFlushTable(String tableName) throws 
> IOException {
>   // (autoflush is true by default)
>   return storedConnection.getTable(tableName, executorService);
> }
> private HTableInterface getBufferedTable(String tableName) throws IOException 
> {
>   HTableInterface table = getAutoFlushTable(tableName);
>   table.setAutoFlush(false);
>   return table;
> }
> // it's my contention that these two methods would behave almost identically,
> // except the first will hit a synchronous flush during the put call,
> and the second will
> // flush during the (hidden) close call on table.
> private void writeAutoFlushed(Put somePut) throws IOException {
>   try (HTableInterface table = getAutoFlushTable(tableName)) {
>     table.put(somePut); // will do synchronous flush
>   }
> }
> private void writeBuffered(Put somePut) throws IOException {
>   try (HTableInterface table = getBufferedTable(tableName)) {
>     table.put(somePut);
>   } // auto-close will trigger synchronous flush
> }
> ```
> For buffered writes to actually provide a performance benefit to users, one 
> of two things must happen:
> - The writeBuffer itself shouldn't live, flush and die with the lifecycle of 
> it's HTableInstance. If the writeBuffer were managed elsewhere and had a long 
> lifespan, this could cease to be an issue. However, if the same writeBuffer 
> is appended to by multiple tables, then some additional concurrency control 
> will be needed around it.
> - Alternatively, there should be some pattern for having long-lived HTable 
> instances. However, since HTable is not thread-safe, we'd need multiple 
> instances, and a mechanism for leasing them out safely -- which sure sounds a 
> lot like the old HTablePool to me.
> See discussion on mailing list here : 
> http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12728) buffered writes substantially less useful after removal of HTablePool

Reply via email to