[
https://issues.apache.org/jira/browse/ACCUMULO-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373267#comment-16373267
]
Jeff Schmidt commented on ACCUMULO-4615:
----------------------------------------
Sorry for the delay on this. I have an initial fix here:
[https://github.com/jschmidt10/accumulo/commit/ce3ffae0e85f0b314af2401fd0dd054b51a51277]
I will be testing it on a deployed system shortly but any early feedback is
appreciated too.
The general idea is to
1) Use a timeout per status gathering task (instead of a timeout for the entire
pool)
2) Changed the status gather results to a threadsafe data structure
(ConcurrentSkipListMap)
3) Added separate property for the status timeout (per tserver)
> ThreadPool timeout when checking tserver stats is confusing
> -----------------------------------------------------------
>
> Key: ACCUMULO-4615
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4615
> Project: Accumulo
> Issue Type: Bug
> Components: master
> Affects Versions: 1.8.1
> Reporter: Michael Wall
> Assignee: Jeff Schmidt
> Priority: Minor
> Fix For: 1.9.0, 2.0.0
>
>
> If it takes longer than the configured time to gather information from all
> the tablet servers, the thread pool stops and processing continues with
> whatever has been collected. Code is
> https://github.com/apache/accumulo/blob/1.8/server/master/src/main/java/org/apache/accumulo/master/Master.java#L1120,
> default timeout is 6s. Does not appear to be an issue prior to 1.8.
> Best case, this was really confusing. The monitor page would have 30
> tservers, then 5 tservers. Didn't really see any other negative effects, no
> migrations and no balancing appeared to be affected. Worse case though, I
> missed something and the master is making decisions based on incomplete
> information.
> [[email protected]] please add more info if needed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)