[ 
https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257274#comment-14257274
 ] 

stack commented on HBASE-12728:
-------------------------------

bq. FWIW, my thoughts here are more about thinking out loud about the nature of 
the problem than offering guidance.

Pardon us if our reaction appeared to not understand that this was the case.  
Your 'outside' perspective has been refreshing up to this. Please do not 
suppress your 'thoughts' because our response on occasion is basic.

bq. Is m/r the main reason for autoFlush? Are there other good cases?

Stating the obvious, autoflush makes sense anytime the writes are small and 
many; it especially makes sense when there is no natural flush point. In the 
m/r case, when-to-flush could be done inside the m/r externally as opposed to 
the autoFlush intrinsic, but it gets a little more complicated in your servlet 
case. Presuming many small writes by many threads, it makes sense that the 
Table instance figures when to flush (size or period).  Could create a 
(thread-safe) Table on servlet init (or a pool) and on destroy, do a 
flush/close of the Table instance (or pool).

I don't think the ConnectionPool concept from sql-world maps well to ours where 
many servers are involved rather than one. Or, in our world, a 
ClusterConnection sort of equates to ConnectionPool since the ClusterConnection 
will put up and cache a Connection per server in the cluster (true, we might 
get more throughput if more than one connection per server in the cluster but 
that I think an implementation detail rather than a modeling item).

I suggest that to answer the problem raised by [~abeppu], that we add back a 
TablePool ([~lhofhansl] -- you think this a regression?) and a thread-safe 
version of Table with the Table instance responsible for write buffer (we could 
add flushing on a period to the pool and as an option on the thread-safe Table 
in another issue)? I can work on this if agreement.



> buffered writes substantially less useful after removal of HTablePool
> ---------------------------------------------------------------------
>
>                 Key: HBASE-12728
>                 URL: https://issues.apache.org/jira/browse/HBASE-12728
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 0.98.0
>            Reporter: Aaron Beppu
>
> In previous versions of HBase, when use of HTablePool was encouraged, HTable 
> instances were long-lived in that pool, and for that reason, if autoFlush was 
> set to false, the table instance could accumulate a full buffer of writes 
> before a flush was triggered. Writes from the client to the cluster could 
> then be substantially larger and less frequent than without buffering.
> However, when HTablePool was deprecated, the primary justification seems to 
> have been that creating HTable instances is cheap, so long as the connection 
> and executor service being passed to it are pre-provided. A use pattern was 
> encouraged where users should create a new HTable instance for every 
> operation, using an existing connection and executor service, and then close 
> the table. In this pattern, buffered writes are substantially less useful; 
> writes are as small and as frequent as they would have been with 
> autoflush=true, except the synchronous write is moved from the operation 
> itself to the table close call which immediately follows.
> More concretely :
> ```
> // Given these two helpers ...
> private HTableInterface getAutoFlushTable(String tableName) throws 
> IOException {
>   // (autoflush is true by default)
>   return storedConnection.getTable(tableName, executorService);
> }
> private HTableInterface getBufferedTable(String tableName) throws IOException 
> {
>   HTableInterface table = getAutoFlushTable(tableName);
>   table.setAutoFlush(false);
>   return table;
> }
> // it's my contention that these two methods would behave almost identically,
> // except the first will hit a synchronous flush during the put call,
> and the second will
> // flush during the (hidden) close call on table.
> private void writeAutoFlushed(Put somePut) throws IOException {
>   try (HTableInterface table = getAutoFlushTable(tableName)) {
>     table.put(somePut); // will do synchronous flush
>   }
> }
> private void writeBuffered(Put somePut) throws IOException {
>   try (HTableInterface table = getBufferedTable(tableName)) {
>     table.put(somePut);
>   } // auto-close will trigger synchronous flush
> }
> ```
> For buffered writes to actually provide a performance benefit to users, one 
> of two things must happen:
> - The writeBuffer itself shouldn't live, flush and die with the lifecycle of 
> it's HTableInstance. If the writeBuffer were managed elsewhere and had a long 
> lifespan, this could cease to be an issue. However, if the same writeBuffer 
> is appended to by multiple tables, then some additional concurrency control 
> will be needed around it.
> - Alternatively, there should be some pattern for having long-lived HTable 
> instances. However, since HTable is not thread-safe, we'd need multiple 
> instances, and a mechanism for leasing them out safely -- which sure sounds a 
> lot like the old HTablePool to me.
> See discussion on mailing list here : 
> http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to