Jeff Schmidt commented on ACCUMULO-4615:

Sorry for the delay on this. I have an initial fix here: 

I will be testing it on a deployed system shortly but any early feedback is 
appreciated too.

The general idea is to 

1) Use a timeout per status gathering task (instead of a timeout for the entire 
2) Changed the status gather results to a threadsafe data structure 
3) Added separate property for the status timeout (per tserver)

> ThreadPool timeout when checking tserver stats is confusing
> -----------------------------------------------------------
>                 Key: ACCUMULO-4615
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4615
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.8.1
>            Reporter: Michael Wall
>            Assignee: Jeff Schmidt
>            Priority: Minor
>             Fix For: 1.9.0, 2.0.0
> If it takes longer than the configured time to gather information from all 
> the tablet servers, the thread pool stops and processing continues with 
> whatever has been collected.  Code is 
> https://github.com/apache/accumulo/blob/1.8/server/master/src/main/java/org/apache/accumulo/master/Master.java#L1120,
>  default timeout is 6s.  Does not appear to be an issue prior to 1.8.
> Best case, this was really confusing.  The monitor page would have 30 
> tservers, then 5 tservers.  Didn't really see any other negative effects, no 
> migrations and no balancing appeared to be affected.  Worse case though, I 
> missed something and the master is making decisions based on incomplete 
> information.
> [~dlmar...@comcast.net] please add more info if needed.

This message was sent by Atlassian JIRA

Reply via email to