[
https://issues.apache.org/jira/browse/ACCUMULO-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Newton resolved ACCUMULO-294.
----------------------------------
Resolution: Not A Problem
> tablet servers are losing zookeeper locks due to garbage collection even when
> there is lots of free memory
> ----------------------------------------------------------------------------------------------------------
>
> Key: ACCUMULO-294
> URL: https://issues.apache.org/jira/browse/ACCUMULO-294
> Project: Accumulo
> Issue Type: Bug
> Components: tserver
> Affects Versions: 1.3.5
> Environment: tablet servers on a large cluster are losing their locks
> Reporter: Eric Newton
> Assignee: Eric Newton
> Priority: Minor
>
> Noticed that 5 tablet servers stopped on a large cluster. Found that each
> server had lost its lock due to a zookeeper session timeout. The zookeeper
> timeout is set to 40 seconds. In all the cases, this lost lock was preceded
> by the ejection of blocks from the block cache, and a garbage collection that
> recovered >4G of memory. The tablet servers were running with 8G, and were
> generally running with 4G free. There was very little time attributed to
> garbage collection, at least as printed in the debug log. The in-memory map
> is small (256M) and running the native version. Will experiment with more
> aggressive concurrent GC settings:
> {noformat}
> -XX:CMSInitiatingOccupancyFraction=75
> {noformat}
> to
> {noformat}
> -XX:CMSInitiatingOccupancyFraction=60
> {noformat}
> Zookeeper has already been configured with this:
> {noformat}
> globalOutstandingLimit=10000
> {noformat}
> Which helped enormously. Each zookeeper server has between 500 and 1700
> clients.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira