[
https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267482#comment-14267482
]
Carter commented on HBASE-12728:
--------------------------------
Lots of great comments/questions. Solomon can dig into some of the grittier
implementation trade-offs, as this is largely his design. I can answer some
initial questions to keep the ball rolling:
{quote}
{{ExceptionListener}} should also get the original {{Put}}
{quote}
We were planning to return {{RetriesExhaustedWithDetailsException}} whenever
possible, which contains the Put. That’s more consistent with calling Put
synchronously.
{quote}
all these API's should accept Mutation as the argument
{quote}
Figuring that’s where we should head eventually, but doesn’t seem we need to be
there immediately, right? Probably makes sense to follow Stack’s idea and make
a more general name, e.g. s/AsyncPutter/AsyncMutator/
{quote}
AsyncPutter looks a lot like HTable's internal AsyncProcess
{quote}
Yeah, worth making sure we’re not reinventing the wheel, but also don’t want to
inherit unnecessary complexity. I'll leave this one to Solomon, and maybe it
can be debated further in a patch review.
{quote}
does BufferedTable.flush() block the calling thread?
{quote}
Absolutely. It’s a way to guarantee that previous writes are persisted, when
such confidence is needed.
{quote}
Isn't PutBuffer just an implementation detail…
{quote}
PutBuffer is an implementation detail, and it’s probably confusing that I
surfaced it in the last straw man. Just disregard it and focus on the
AsyncPutter, which is the long-lived owner of the buffered mutation mechanism
and all that entails.
{quote}
In the event of a BufferedTable that does not share it's PutBuffer with any
other instances, can it just defer back to the current implementation in HTable?
{quote}
If it makes sense to have a different AsyncPutter for single-threaded
situations than multi-threaded, then I’d suggest another implementation rather
than trying to make it guess for itself. But why do you think we should have
multiple implementations? Synchronization locks should be insignificant
compared to even very fast wire latencies.
{quote}
On AsyncPutter, will we ever want to buffer Increments or Appends or Deletes?
{quote}
I think deletes make sense. Appends make me nervous because we’re already
breaking sequence guarantees. I think if we do a solution which supports puts,
and can be extended to any other mutation, then we can decide at another time.
Using {[AsyncMutator}} as a name would hint at such extensibility.
{quote}
do we even need to expose BufferedTable?
{quote}
Yes, need it for flush(), which I described more above.
Re: stack’s concern on ExceptionListener, returning
RetriesExhaustedWithDetailsException should provide enough info, no?
{quote}
BufferedTable shouldn't have a close
{quote}
I agree, but if it implements Table, then it has to have a nop close.
We’re generally still in alignment with Lars’ comments. Table implementations
become lightweight. BufferedConnection might be a bit of a misnomer, since
it’s more a factory than a “buffered connection”. But it seems easier for a
developer to understand rather than minting a new factory concept.
{quote}
Just add a Connection.getBufferedTable
{quote}
But Connection is an interface. BufferedConnection is intended as an
implementation of that interface, something like:
{code:java}
public class BufferedConnection implements Connection {
public BufferedConnection(Connection conn, ExceptionListener l);
}
{code}
The idea is that given any implementation of conn, BufferedConnection will wrap
the encapsulated tables that are created with buffering logic. If we put it in
Connection, then we make buffering first-class functionality again, rather than
extended functionality.
{quote}
Because in the proposal, the BufferedTable is not the owner of the put buffer
{quote}
BufferedTable#flush would be a synchronous pass-through to AsyncPutter#flush.
It doesn't need to own it to do so.
Some more comments...
Was not planning to have BufferedTable to be heavy or thread safe. Doing that
makes it less interchangeable with HTable — which is not thread safe, and would
be lighter after this refactor.
Basically the lifecycle revolves around the {{AsyncMutator}}, which is the only
heavy thing here. BufferedConnection constructs and owns it, and injects the
mutator into new BufferedTable objects, which will be cheap to construct.
Calling BufferedConnection#close would close AsyncMutator, which in turn would
flush its buffers and shutdown its worker threads. (And throw
IllegalStateException if any other operations come through, except for another
close, which should be idempotent.)
There was also a question about the need for two Connection objects. There
only needs to be one Connection object, which BufferedConnection would wrap.
All operations, synchronous and asynchronous would go through it.
> buffered writes substantially less useful after removal of HTablePool
> ---------------------------------------------------------------------
>
> Key: HBASE-12728
> URL: https://issues.apache.org/jira/browse/HBASE-12728
> Project: HBase
> Issue Type: Bug
> Components: hbase
> Affects Versions: 0.98.0
> Reporter: Aaron Beppu
> Assignee: Solomon Duskis
> Priority: Blocker
> Fix For: 1.0.0, 2.0.0, 1.1.0
>
>
> In previous versions of HBase, when use of HTablePool was encouraged, HTable
> instances were long-lived in that pool, and for that reason, if autoFlush was
> set to false, the table instance could accumulate a full buffer of writes
> before a flush was triggered. Writes from the client to the cluster could
> then be substantially larger and less frequent than without buffering.
> However, when HTablePool was deprecated, the primary justification seems to
> have been that creating HTable instances is cheap, so long as the connection
> and executor service being passed to it are pre-provided. A use pattern was
> encouraged where users should create a new HTable instance for every
> operation, using an existing connection and executor service, and then close
> the table. In this pattern, buffered writes are substantially less useful;
> writes are as small and as frequent as they would have been with
> autoflush=true, except the synchronous write is moved from the operation
> itself to the table close call which immediately follows.
> More concretely :
> ```
> // Given these two helpers ...
> private HTableInterface getAutoFlushTable(String tableName) throws
> IOException {
> // (autoflush is true by default)
> return storedConnection.getTable(tableName, executorService);
> }
> private HTableInterface getBufferedTable(String tableName) throws IOException
> {
> HTableInterface table = getAutoFlushTable(tableName);
> table.setAutoFlush(false);
> return table;
> }
> // it's my contention that these two methods would behave almost identically,
> // except the first will hit a synchronous flush during the put call,
> and the second will
> // flush during the (hidden) close call on table.
> private void writeAutoFlushed(Put somePut) throws IOException {
> try (HTableInterface table = getAutoFlushTable(tableName)) {
> table.put(somePut); // will do synchronous flush
> }
> }
> private void writeBuffered(Put somePut) throws IOException {
> try (HTableInterface table = getBufferedTable(tableName)) {
> table.put(somePut);
> } // auto-close will trigger synchronous flush
> }
> ```
> For buffered writes to actually provide a performance benefit to users, one
> of two things must happen:
> - The writeBuffer itself shouldn't live, flush and die with the lifecycle of
> it's HTableInstance. If the writeBuffer were managed elsewhere and had a long
> lifespan, this could cease to be an issue. However, if the same writeBuffer
> is appended to by multiple tables, then some additional concurrency control
> will be needed around it.
> - Alternatively, there should be some pattern for having long-lived HTable
> instances. However, since HTable is not thread-safe, we'd need multiple
> instances, and a mechanism for leasing them out safely -- which sure sounds a
> lot like the old HTablePool to me.
> See discussion on mailing list here :
> http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)