Hi All, HBase version: 0.90.3 + Patches Hadoop version: CDH3u0 Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937, https://issues.apache.org/jira/browse/HBASE-4003
We have been using the 'hbase.client.operation.timeout' knob introduced in 2937 for quite some time now. It helps us enforce SLA. We have two HBase clusters and two HBase client clusters. One of them is much busier than the other. We have seen a deterministic behavior of clients running in busy cluster. Their (client's) memory footprint increases consistently after they have been up for roughly 24 hours. This memory footprint almost doubles from its usual value (usual case == RPC timeout disabled). After much investigation nothing concrete came out and we had to put a hack which keep heap size in control even when RPC timeout is enabled. Also please note , the same behavior is not observed in 'not so busy cluster. The patch is here : https://gist.github.com/1288023 Can some one, who is also running RPC timeout in production under fair load, please share the experience. -Shrijeet
