[jira] [Commented] (ACCUMULO-4615) ThreadPool timeout when checking tserver stats is confusing

Jeff Schmidt (JIRA) Thu, 22 Feb 2018 11:12:00 -0800

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373267#comment-16373267
 ]


Jeff Schmidt commented on ACCUMULO-4615:
----------------------------------------

Sorry for the delay on this. I have an initial fix here: 
[https://github.com/jschmidt10/accumulo/commit/ce3ffae0e85f0b314af2401fd0dd054b51a51277]

I will be testing it on a deployed system shortly but any early feedback is 
appreciated too.

The general idea is to 

1) Use a timeout per status gathering task (instead of a timeout for the entire 
pool)
2) Changed the status gather results to a threadsafe data structure 
(ConcurrentSkipListMap)
3) Added separate property for the status timeout (per tserver)

> ThreadPool timeout when checking tserver stats is confusing
> -----------------------------------------------------------
>
>                 Key: ACCUMULO-4615
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4615
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.8.1
>            Reporter: Michael Wall
>            Assignee: Jeff Schmidt
>            Priority: Minor
>             Fix For: 1.9.0, 2.0.0
>
>
> If it takes longer than the configured time to gather information from all 
> the tablet servers, the thread pool stops and processing continues with 
> whatever has been collected.  Code is 
> https://github.com/apache/accumulo/blob/1.8/server/master/src/main/java/org/apache/accumulo/master/Master.java#L1120,
>  default timeout is 6s.  Does not appear to be an issue prior to 1.8.
> Best case, this was really confusing.  The monitor page would have 30 
> tservers, then 5 tservers.  Didn't really see any other negative effects, no 
> migrations and no balancing appeared to be affected.  Worse case though, I 
> missed something and the master is making decisions based on incomplete 
> information.
> [[email protected]] please add more info if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ACCUMULO-4615) ThreadPool timeout when checking tserver stats is confusing

Reply via email to