[ https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329863#comment-16329863 ]
Appy commented on HBASE-19527: ------------------------------ We can't make production use case worse (abrupt endings on shutdown and unnecessary recoveries on restart) for the sake of extra testing. Former is far more important than latter. bq. The Master or RegionServer threads determine whether we should go down or not. If they are stopped or aborted, then all else should go down. Lets not be having to do a decision-per-thread on when to go down (this gets really hard to do... sometimes its exit if process is stopped, other times it is if cluster is up or down, and other combos...). I agree it's hard to do for everything, but worth doing (in this case, keeping what is) for few critical systems. ProcExecutor is critical and will only become more important with time as more sub-systems (replication, backup, etc) move to it. I don't mind the change in ExecutorService (mostly because i don't enough to make a case for it, nor have time to dig.) Among other thread pools, RPC executors for user requests are probably even lesser important and can go down randomly (not relevant here, but trying to think holistically to bring good points to table) bq. If a worker thread is doing something that it can't give up, that we cannot recover from, thats a problem; lets find it sooner rather than later given threads can exit any which way at any time. True we can use more fault tolerance testing. But the answer to that should be adding more fault tolerant testing, rather than making the system indeterministic. bq. Finding all the combinations, the code paths that lead to an exit, and exits concurrent with various combinations of operations, would be too much work; we'd never achieve complete coverage – I suggest. Yeah, we can never do that. If we could, we won't need the "guard". bq. Suggest we try this and the watch the flakies a while... Can revert if a bad idea. The alternate is having extra complexity for cleaner shutdown and restart. How will changes in flakies justify for/against that? Btw, all our ProcFramework's fault tolerance testing is doing join() on these threads, so making them daemon doesn't make those tests any better. > Make ExecutorService threads daemon=true. > ----------------------------------------- > > Key: HBASE-19527 > URL: https://issues.apache.org/jira/browse/HBASE-19527 > Project: HBase > Issue Type: Sub-task > Reporter: stack > Assignee: stack > Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19527.branch-2.001.patch, > HBASE-19527.branch-2.002.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.001.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.002.patch > > > Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going > down, no point in these threads sticking around (I think). Let me try this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)