[ 
https://issues.apache.org/jira/browse/HBASE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084503#comment-13084503
 ] 

[email protected] commented on HBASE-4192:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1463/#review1436
-----------------------------------------------------------


some new unit tests here would be nice - specifically if we can use mock 
objects and fire up a bunch of threads to pound on the various synchronization 
primitives. I haven't convinced myself that the synchronization is all correct 
yet, and some tests that "prove it" would help :)


src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
<https://reviews.apache.org/r/1463/#comment3304>

    can you write a test case for this bug fix, separately from this HLog 
optimization? seems like you should be able to trigger it by delaying response 
to a different thread which runs immediately, then sleep for a second in the 
main RPC thread, right?



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<https://reviews.apache.org/r/1463/#comment3305>

    I really don't like that we're threading the RPCServer through to the 
HRegion and HLog objects. There's something really messy about it: imagine, for 
example, if we had two separate RPC servers on each daemon, listening on 
different ports (this is done in the Hadoop namenode in order to achieve some 
easy QoS). Then we'd have to thread two RPCServers through.
    
    Instead, I think the individual operations should use ThreadLocals to grab 
the current Call object, perhaps? If the ThreadLocal lookups are too expensive, 
then perhaps a new interface like "CallContext" which we thread through the 
processing of individual operations?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3315>

    this variable is never reset, so it really acts like a sequence number. I'm 
not sure if it needs to exist independently from the other sequence number that 
we have



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3311>

    it's the last transaction for which someone has requested a sync, right? 
i.e not the last transaction that has actually _been_ synced?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3306>

    making this a volatile long doesn't make the += atomic. You should use 
AtomicLong most likely?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3307>

    this too. I guess you were following the (incorrect) pattern above... may 
as well fix these.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3308>

    rather than reading both of these as members of HLog, I think it would be 
cleaner to pass Configuration along to LogSyncer, since they're only used from 
within there, right? You could do the same with optionalFlushInterval while 
you're at it.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3309>

    style: add braces



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3310>

    worth adding: assert Thread.currentThread().holdsLock(updateLock) here?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3312>

    if we got interrupted, that typically means that we should shutdown, right? 
swallowing the exception smells wrong



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3313>

    don't you mean "that do have"? ie if the edits have been cacheflushed, then 
we don't need to replay them.
    
    This javadoc might make more sense to put up at the top of the class?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3317>

    this can be a static inner class, yea?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3318>

    submit seems like a strange name for this... 'complete' maybe?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3322>

    the condition we're waiting on is actually queueNotEmpty, isn't it? or 
somethingToSync?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3321>

    make these final



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3323>

    I don't understand this sentence.. what is "0 or less"?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3324>

    should pass "e" as second arg, not stringify it - that way you get a stack 
trace



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3325>

    do you need to really do new Long? it should auto-box



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3326>

    e as 2nd arg. Since this error will result in a hung RPC client, it would 
be great to give as much detail as possible - can we somehow funnel the client 
IP, the transaction ID, etc into this message?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3327>

    i'm skeptical about the 'synchronized' keyword here - we have an internal 
lock, but we also use the object's monitor lock?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3328>

    hrm... does this do anything, given logSyncerThread is now waiting on 
Conditions instead of on the monitor obj?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3329>

    typo: and



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3330>

    these three different paths scare me a bit... do we really need to support 
all three? ie are there any significant disadvantages to always doing the 
delayed responses, for example?



src/main/resources/hbase-default.xml
<https://reviews.apache.org/r/1463/#comment3331>

    I feel like this config should be based on size, not count... we've gotten 
in trouble in the past where we do things on count and people are doing very 
wide cells, etc, and we OOM


- Todd


On 2011-08-12 23:44:23, Vlad Dogaru wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1463/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-12 23:44:23)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Changes:
bq.  
bq.  1.  Add hbase.region.wal.batchentries configuration parameter.  If this is
bq.  enabled, batch entries to the HLog in a queue.
bq.  2.  Use delayed RPCs for sync requests when aggresive batching is enabled.
bq.  This frees up RPC handler threads for the duration of the sync.
bq.  3.  Pass the RPC server instance all the way to down to HLog.  This is 
needed
bq.  to find out the current remote call, mark it as delayed, and finally 
complete
bq.  it when the sync is done.
bq.  4.  Use the region read-write consistency control to avoid exposing to
bq.  RegionScanners edits which have not yet been synced.
bq.  5.  Change a few tests which directly create HRegions or HLogs.  The
bq.  rpcServers passed in are null, HLog falls back to classic RPCs when it has 
no
bq.  knowledge of the RPC server.
bq.  6.  Add TestBatchEntriesLogRolling, which is identical to TestLogRolling,
bq.  except that it uses aggressive batching.  I'm not sure how to add tests 
that
bq.  verify the same functionality but don't duplicate code, suggestion are
bq.  welcome.
bq.  
bq.  The new parameter is disabled by default.
bq.  
bq.  
bq.  This addresses bug HBASE-4192.
bq.      https://issues.apache.org/jira/browse/HBASE-4192
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 7117bce 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java a00b93d 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 83ff7b2 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
7a917da 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java
 8ec53d3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 887f736 
bq.    src/main/resources/hbase-default.xml 66548ca 
bq.    
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java
 dc43eb2 
bq.    
src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
 381ac90 
bq.  
bq.  Diff: https://reviews.apache.org/r/1463/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests run with aggressive batching turned on and off.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Vlad
bq.  
bq.



> Optimize HLog for Throughput Using Delayed RPCs
> -----------------------------------------------
>
>                 Key: HBASE-4192
>                 URL: https://issues.apache.org/jira/browse/HBASE-4192
>             Project: HBase
>          Issue Type: New Feature
>          Components: wal
>    Affects Versions: 0.92.0
>            Reporter: Vlad Dogaru
>            Priority: Minor
>
> Introduce a new HLog configuration parameter (batchEntries) for more 
> aggressive batching of appends.  If this is enabled, HLog appends are not 
> written to the HLog writer immediately, but batched and written either 
> periodically or when a sync is requested.  Because sync times become larger, 
> they use delayed RPCs to free up RPC handler threads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to