[
https://issues.apache.org/jira/browse/HBASE-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922251#action_12922251
]
Daniel Einspanjer commented on HBASE-2751:
------------------------------------------
I don't believe this should be a minor issue. It seems to be at the heart of
Mozilla's recent cluster instability as over time, each server in our 19 node
cluster opens more and more connections to all the others and eventually, when
we get around 10k connections, we have to restart the cluster due to client lag.
If there were configurations for the number of open connections to trigger
cleanup and a threshold for how many to close on each cleanup, we could use an
LRU to close the oldest connections. This would impose an access penalty on
those unopened regions (which we currently have to pay already the first time
they are accessed) but would prevent connection overload.
> Consider closing StoreFiles sometimes
> -------------------------------------
>
> Key: HBASE-2751
> URL: https://issues.apache.org/jira/browse/HBASE-2751
> Project: HBase
> Issue Type: Improvement
> Reporter: Jean-Daniel Cryans
> Priority: Minor
>
> Having a lot of regions per region server could be considered harmless if
> most of them aren't used, but that's not really true at the moment. We keep
> all files opened all the time (except for rolled HLogs). I'm thinking of 2
> solutions
> # Lazy open the store files, or at least close them down after we read the
> file info. Or we could do this for every file except the most recent one.
> # Close files when they're not in use. We need some heuristic to determine
> when is the best moment to declare that a file can be closed.
> Both solutions go hand in hand, and I think it would be a huge gain in order
> to lower the ulimit and xceivers-related issues.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.