[jira] [Commented] (HBASE-4192) Optimize HLog for Throughput Using Delayed RPCs

[email protected] (JIRA) Fri, 12 Aug 2011 18:03:54 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084503#comment-13084503
 ]

[email protected] commented on HBASE-4192:
------------------------------------------------------

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1463/#review1436
-----------------------------------------------------------

some new unit tests here would be nice - specifically if we can use mock
objects and fire up a bunch of threads to pound on the various synchronization
primitives. I haven't convinced myself that the synchronization is all correct
yet, and some tests that "prove it" would help :)

src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
<https://reviews.apache.org/r/1463/#comment3304>

can you write a test case for this bug fix, separately from this HLog
optimization? seems like you should be able to trigger it by delaying response
to a different thread which runs immediately, then sleep for a second in the
main RPC thread, right?

src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<https://reviews.apache.org/r/1463/#comment3305>

I really don't like that we're threading the RPCServer through to the
HRegion and HLog objects. There's something really messy about it: imagine, for
example, if we had two separate RPC servers on each daemon, listening on
different ports (this is done in the Hadoop namenode in order to achieve some
easy QoS). Then we'd have to thread two RPCServers through.

Instead, I think the individual operations should use ThreadLocals to grab
the current Call object, perhaps? If the ThreadLocal lookups are too expensive,
then perhaps a new interface like "CallContext" which we thread through the
processing of individual operations?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3315>

this variable is never reset, so it really acts like a sequence number. I'm
not sure if it needs to exist independently from the other sequence number that
we have

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3311>

it's the last transaction for which someone has requested a sync, right?
i.e not the last transaction that has actually _been_ synced?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3306>

making this a volatile long doesn't make the += atomic. You should use
AtomicLong most likely?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3307>

this too. I guess you were following the (incorrect) pattern above... may
as well fix these.

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3308>

rather than reading both of these as members of HLog, I think it would be
cleaner to pass Configuration along to LogSyncer, since they're only used from
within there, right? You could do the same with optionalFlushInterval while
you're at it.

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3309>

style: add braces

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3310>

worth adding: assert Thread.currentThread().holdsLock(updateLock) here?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3312>

if we got interrupted, that typically means that we should shutdown, right?
swallowing the exception smells wrong

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3313>

don't you mean "that do have"? ie if the edits have been cacheflushed, then
we don't need to replay them.

This javadoc might make more sense to put up at the top of the class?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3317>

this can be a static inner class, yea?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3318>

submit seems like a strange name for this... 'complete' maybe?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3322>

the condition we're waiting on is actually queueNotEmpty, isn't it? or
somethingToSync?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3321>

make these final

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3323>

I don't understand this sentence.. what is "0 or less"?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3324>

should pass "e" as second arg, not stringify it - that way you get a stack
trace

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3325>

do you need to really do new Long? it should auto-box

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3326>

e as 2nd arg. Since this error will result in a hung RPC client, it would
be great to give as much detail as possible - can we somehow funnel the client
IP, the transaction ID, etc into this message?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3327>

i'm skeptical about the 'synchronized' keyword here - we have an internal
lock, but we also use the object's monitor lock?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3328>

hrm... does this do anything, given logSyncerThread is now waiting on
Conditions instead of on the monitor obj?

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3329>

typo: and

src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/1463/#comment3330>

these three different paths scare me a bit... do we really need to support
all three? ie are there any significant disadvantages to always doing the
delayed responses, for example?

src/main/resources/hbase-default.xml
<https://reviews.apache.org/r/1463/#comment3331>

I feel like this config should be based on size, not count... we've gotten
in trouble in the past where we do things on count and people are doing very
wide cells, etc, and we OOM

- Todd

On 2011-08-12 23:44:23, Vlad Dogaru wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1463/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-08-12 23:44:23)
bq.
bq.
bq. Review request for hbase.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Changes:
bq.
bq. 1. Add hbase.region.wal.batchentries configuration parameter. If this is
bq. enabled, batch entries to the HLog in a queue.
bq. 2. Use delayed RPCs for sync requests when aggresive batching is enabled.
bq. This frees up RPC handler threads for the duration of the sync.
bq. 3. Pass the RPC server instance all the way to down to HLog. This is
needed
bq. to find out the current remote call, mark it as delayed, and finally
complete
bq. it when the sync is done.
bq. 4. Use the region read-write consistency control to avoid exposing to
bq. RegionScanners edits which have not yet been synced.
bq. 5. Change a few tests which directly create HRegions or HLogs. The
bq. rpcServers passed in are null, HLog falls back to classic RPCs when it has
no
bq. knowledge of the RPC server.
bq. 6. Add TestBatchEntriesLogRolling, which is identical to TestLogRolling,
bq. except that it uses aggressive batching. I'm not sure how to add tests
that
bq. verify the same functionality but don't duplicate code, suggestion are
bq. welcome.
bq.
bq. The new parameter is disabled by default.
bq.
bq.
bq. This addresses bug HBASE-4192.
bq. https://issues.apache.org/jira/browse/HBASE-4192
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 7117bce
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java a00b93d
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 83ff7b2
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
7a917da
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java
8ec53d3
bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 887f736
bq. src/main/resources/hbase-default.xml 66548ca
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java
dc43eb2
bq.
src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
381ac90
bq.
bq. Diff: https://reviews.apache.org/r/1463/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All unit tests run with aggressive batching turned on and off.
bq.
bq.
bq. Thanks,
bq.
bq. Vlad
bq.
bq.

> Optimize HLog for Throughput Using Delayed RPCs
> -----------------------------------------------
>
> Key: HBASE-4192
> URL: https://issues.apache.org/jira/browse/HBASE-4192
> Project: HBase
> Issue Type: New Feature
> Components: wal
> Affects Versions: 0.92.0
> Reporter: Vlad Dogaru
> Priority: Minor
>
> Introduce a new HLog configuration parameter (batchEntries) for more
> aggressive batching of appends. If this is enabled, HLog appends are not
> written to the HLog writer immediately, but batched and written either
> periodically or when a sync is requested. Because sync times become larger,
> they use delayed RPCs to free up RPC handler threads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4192) Optimize HLog for Throughput Using Delayed RPCs

Reply via email to