[jira] [Commented] (HBASE-12728) buffered writes substantially less useful after removal of HTablePool

Carter (JIRA) Wed, 07 Jan 2015 02:17:09 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267482#comment-14267482
 ]


Carter commented on HBASE-12728:
--------------------------------

Lots of great comments/questions.  Solomon can dig into some of the grittier 
implementation trade-offs, as this is largely his design.  I can answer some 
initial questions to keep the ball rolling:

{quote}
{{ExceptionListener}} should also get the original {{Put}}
{quote}

We were planning to return {{RetriesExhaustedWithDetailsException}} whenever 
possible, which contains the Put.  That’s more consistent with calling Put 
synchronously.

{quote}
all these API's should accept Mutation as the argument
{quote}

Figuring that’s where we should head eventually, but doesn’t seem we need to be 
there immediately, right?  Probably makes sense to follow Stack’s idea and make 
a more general name, e.g. s/AsyncPutter/AsyncMutator/

{quote}
AsyncPutter looks a lot like HTable's internal AsyncProcess
{quote}

Yeah, worth making sure we’re not reinventing the wheel, but also don’t want to 
inherit unnecessary complexity.  I'll leave this one to Solomon, and maybe it 
can be debated further in a patch review.

{quote}
does BufferedTable.flush() block the calling thread?
{quote}

Absolutely.  It’s a way to guarantee that previous writes are persisted, when 
such confidence is needed.

{quote}
Isn't PutBuffer just an implementation detail…
{quote}

PutBuffer is an implementation detail, and it’s probably confusing that I 
surfaced it in the last straw man.  Just disregard it and focus on the 
AsyncPutter, which is the long-lived owner of the buffered mutation mechanism 
and all that entails.

{quote}
In the event of a BufferedTable that does not share it's PutBuffer with any 
other instances, can it just defer back to the current implementation in HTable?
{quote}

If it makes sense to have a different AsyncPutter for single-threaded 
situations than multi-threaded, then I’d suggest another implementation rather 
than trying to make it guess for itself.  But why do you think we should have 
multiple implementations?  Synchronization locks should be insignificant 
compared to even very fast wire latencies.

{quote}
On AsyncPutter, will we ever want to buffer Increments or Appends or Deletes?
{quote}

I think deletes make sense.  Appends make me nervous because we’re already 
breaking sequence guarantees.  I think if we do a solution which supports puts, 
and can be extended to any other mutation, then we can decide at another time.  
Using {[AsyncMutator}} as a name would hint at such extensibility.

{quote}
do we even need to expose BufferedTable?
{quote}

Yes, need it for flush(), which I described more above.

Re: stack’s concern on ExceptionListener, returning 
RetriesExhaustedWithDetailsException should provide enough info, no?

{quote}
BufferedTable shouldn't have a close
{quote}

I agree, but if it implements Table, then it has to have a nop close.

We’re generally still in alignment with Lars’ comments.  Table implementations 
become lightweight.  BufferedConnection might be a bit of a misnomer, since 
it’s more a factory than a “buffered connection”.  But it seems easier for a 
developer to understand rather than minting a new factory concept.

{quote}
Just add a Connection.getBufferedTable
{quote}

But Connection is an interface.  BufferedConnection is intended as an 
implementation of that interface, something like:

{code:java}
public class BufferedConnection implements Connection {
    public BufferedConnection(Connection conn, ExceptionListener l);
}
{code}

The idea is that given any implementation of conn, BufferedConnection will wrap 
the encapsulated tables that are created with buffering logic.  If we put it in 
Connection, then we make buffering first-class functionality again, rather than 
extended functionality.

{quote}
Because in the proposal, the BufferedTable is not the owner of the put buffer
{quote}

BufferedTable#flush would be a synchronous pass-through to AsyncPutter#flush.  
It doesn't need to own it to do so.

Some more comments...

Was not planning to have BufferedTable to be heavy or thread safe.  Doing that 
makes it less interchangeable with HTable — which is not thread safe, and would 
be lighter after this refactor.

Basically the lifecycle revolves around the {{AsyncMutator}}, which is the only 
heavy thing here.  BufferedConnection constructs and owns it, and injects the 
mutator into new BufferedTable objects, which will be cheap to construct.  
Calling BufferedConnection#close would close AsyncMutator, which in turn would 
flush its buffers and shutdown its worker threads.  (And throw 
IllegalStateException if any other operations come through, except for another 
close, which should be idempotent.)

There was also a question about the need for two Connection objects.  There 
only needs to be one Connection object, which BufferedConnection would wrap.  
All operations, synchronous and asynchronous would go through it.


> buffered writes substantially less useful after removal of HTablePool
> ---------------------------------------------------------------------
>
>                 Key: HBASE-12728
>                 URL: https://issues.apache.org/jira/browse/HBASE-12728
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 0.98.0
>            Reporter: Aaron Beppu
>            Assignee: Solomon Duskis
>            Priority: Blocker
>             Fix For: 1.0.0, 2.0.0, 1.1.0
>
>
> In previous versions of HBase, when use of HTablePool was encouraged, HTable 
> instances were long-lived in that pool, and for that reason, if autoFlush was 
> set to false, the table instance could accumulate a full buffer of writes 
> before a flush was triggered. Writes from the client to the cluster could 
> then be substantially larger and less frequent than without buffering.
> However, when HTablePool was deprecated, the primary justification seems to 
> have been that creating HTable instances is cheap, so long as the connection 
> and executor service being passed to it are pre-provided. A use pattern was 
> encouraged where users should create a new HTable instance for every 
> operation, using an existing connection and executor service, and then close 
> the table. In this pattern, buffered writes are substantially less useful; 
> writes are as small and as frequent as they would have been with 
> autoflush=true, except the synchronous write is moved from the operation 
> itself to the table close call which immediately follows.
> More concretely :
> ```
> // Given these two helpers ...
> private HTableInterface getAutoFlushTable(String tableName) throws 
> IOException {
>   // (autoflush is true by default)
>   return storedConnection.getTable(tableName, executorService);
> }
> private HTableInterface getBufferedTable(String tableName) throws IOException 
> {
>   HTableInterface table = getAutoFlushTable(tableName);
>   table.setAutoFlush(false);
>   return table;
> }
> // it's my contention that these two methods would behave almost identically,
> // except the first will hit a synchronous flush during the put call,
> and the second will
> // flush during the (hidden) close call on table.
> private void writeAutoFlushed(Put somePut) throws IOException {
>   try (HTableInterface table = getAutoFlushTable(tableName)) {
>     table.put(somePut); // will do synchronous flush
>   }
> }
> private void writeBuffered(Put somePut) throws IOException {
>   try (HTableInterface table = getBufferedTable(tableName)) {
>     table.put(somePut);
>   } // auto-close will trigger synchronous flush
> }
> ```
> For buffered writes to actually provide a performance benefit to users, one 
> of two things must happen:
> - The writeBuffer itself shouldn't live, flush and die with the lifecycle of 
> it's HTableInstance. If the writeBuffer were managed elsewhere and had a long 
> lifespan, this could cease to be an issue. However, if the same writeBuffer 
> is appended to by multiple tables, then some additional concurrency control 
> will be needed around it.
> - Alternatively, there should be some pattern for having long-lived HTable 
> instances. However, since HTable is not thread-safe, we'd need multiple 
> instances, and a mechanism for leasing them out safely -- which sure sounds a 
> lot like the old HTablePool to me.
> See discussion on mailing list here : 
> http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12728) buffered writes substantially less useful after removal of HTablePool

Reply via email to