[ 
https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329863#comment-16329863
 ] 

Appy commented on HBASE-19527:
------------------------------

We can't make production use case worse (abrupt endings on shutdown and 
unnecessary recoveries on restart) for the sake of extra testing. Former is far 
more important than latter.

bq. The Master or RegionServer threads determine whether we should go down or 
not. If they are stopped or aborted, then all else should go down. Lets not be 
having to do a decision-per-thread on when to go down (this gets really hard to 
do... sometimes its exit if process is stopped, other times it is if cluster is 
up or down, and other combos...). 
I agree it's hard to do for everything, but worth doing (in this case, keeping 
what is) for few critical systems. ProcExecutor is critical and will only 
become more important with time as more sub-systems (replication, backup, etc) 
move to it.
I don't mind the change in ExecutorService (mostly because i don't enough to 
make a case for it, nor have time to dig.)
Among other thread pools, RPC executors for user requests are probably even 
lesser important and can go down randomly (not relevant here, but trying to 
think holistically to bring good points to table)

bq. If a worker thread is doing something that it can't give up, that we cannot 
recover from, thats a problem; lets find it sooner rather than later given 
threads can exit any which way at any time.
True we can use more fault tolerance testing. But the answer to that should be 
adding more fault tolerant testing, rather than making the system 
indeterministic.

bq. Finding all the combinations, the code paths that lead to an exit, and 
exits concurrent with various combinations of operations, would be  too much 
work; we'd never achieve complete coverage – I suggest.
Yeah, we can never do that. If we could, we won't need the "guard".

bq. Suggest we try this and the watch the flakies a while...  Can revert if a 
bad idea.
The alternate is having extra complexity for cleaner shutdown and restart. How 
will changes in flakies justify for/against that?
Btw, all our ProcFramework's fault tolerance testing is doing join() on these 
threads, so making them daemon doesn't make those tests any better.

> Make ExecutorService threads daemon=true.
> -----------------------------------------
>
>                 Key: HBASE-19527
>                 URL: https://issues.apache.org/jira/browse/HBASE-19527
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: stack
>            Assignee: stack
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>
>         Attachments: HBASE-19527.branch-2.001.patch, 
> HBASE-19527.branch-2.002.patch, HBASE-19527.master.001.patch, 
> HBASE-19527.master.001.patch, HBASE-19527.master.001.patch, 
> HBASE-19527.master.002.patch
>
>
> Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going 
> down, no point in these threads sticking around (I think). Let me try this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to