On Wed, Jul 25, 2012 at 5:58 PM, Himanshu Vashishtha <[email protected]> wrote: > Hi, > Replication works good when run in short span. But its performance for > a long running setup seems to degrade at the slave cluster side. To an > extant, it made it unresponsive in one of our testing environment. As > per jstack on one node, all its priority handlers were blocked in the > replicateLogEntries method, which is blocked as the cluster is in bad > shape (2/4 nodes died; root is unassigned; and the node which had it > previously became un-responsive; and the only other remaining node > doesn't have any priority handler left to take care of the root region > assignment).
See: https://issues.apache.org/jira/browse/HBASE-4280 https://issues.apache.org/jira/browse/HBASE-5197 https://issues.apache.org/jira/browse/HBASE-6207 https://issues.apache.org/jira/browse/HBASE-6165 Currently the best way to fix this would be to have a separate set of handlers completely. > The memory footprint of the app also increases (based on > `top`; unfortunately, no gc logs at the moment). You don't want to rely on top for that since it's a java application. Set you Xms as big as your Xmx and your application will always use all the memory it's given. > > The replicateLogEntries is a high QOS method; ReplicationSink's > overall behavior is to act as a native hbase client and replicate the > mutations in its cluster. This may take some time, in case region is > splitting, possible gc pause, etc at the target region servers. It > enters in the retrying loop, and this blocks the priority handler > serving that method. > Meanwhile, other master cluster region servers are also shipping edits > (to this, or other regionservers). This makes the situation more > worse. > > I wonder whether others have seen this before. Please share. See my first answer. > > There is some scope of improvements at Sink side: > > a) ReplicationSink#replicateLogEntries: Make it a normal operation (no > high QOS annotation), and ReplicationSink periodically checks whether > the client is still connected or not. In case its not, just throws an > exception and bail out. The client will do a resend of the shipment > anyway. This frees up the handlers from blocking, and cluster's > normal operation will not be impeded. It wasn't working any better before HBASE-4280 :) > > b) Have a threadpool in ReplicationSink and process per table request > in parallel. Should help in case of multi table replication. Currently it's trying to apply the edits sequentially, going parallel would apply them in the wrong order. Note that when a region server fail we do continue to replicate the new edits while we also replicate the backlog from the old server so currently it's not 100% perfect. > c) Freeing the memory consumed by the shipped array, as soon as the > mutation list is populated. Currently, if the call to multi is blocked > (by any reason), the regionserver enters in the retrying logic... and > since entries of WALEdits array is copied as Put/Delete objects, it > can be freed. So free up the entries array at each position after the Put or Delete was created? We could do that, although it's not a big saving considering that entries will be at most 64MB big. In production here we run with just 1 MB. J-D
