> On July 6, 2015, 6:35 p.m., Sudheesh Katkam wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java, > > line 841 > > <https://reviews.apache.org/r/36168/diff/1/?file=998962#file998962line841> > > > > If there is a large number of fragments being sent and the last one > > failed in setupNonRootFragments, we do not _wait for all fragments to > > finish cleaning up_, correct? It is a limitation then? > > abdelhakim deneche wrote: > true and that is exactly the scenario described in the else-block comment. > > Unfortunately I couldn't find a straightforward way to fix that case > (that was the reason I discarded the previous review request in the first > place). > > One thing to keep in mind though, if we try to wait for all running > fragments to finish (like we do for cancelled queries) there is a risk for > the Foreman to wait forever if some of those fragments become unresponsive, > or in this case the fragment didn't even start running on the remote > fragment. That's exactly what I tried to avoid by using the > "nonRootFragmentsSent" flag
We should strive to get to the point where we wait. Our target is Foreman always knowing the correct state of its fragemnts and never being destroyed until everything else is already gone. If there are edge cases we don't currently manage to achieve this, we should file bugs for those and get them fixed. - Jacques ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36168/#review90514 ----------------------------------------------------------- On July 3, 2015, 4:13 p.m., abdelhakim deneche wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/36168/ > ----------------------------------------------------------- > > (Updated July 3, 2015, 4:13 p.m.) > > > Review request for drill, Jacques Nadeau and Sudheesh Katkam. > > > Bugs: DRILL-3167 > https://issues.apache.org/jira/browse/DRILL-3167 > > > Repository: drill-git > > > Description > ------- > > when a query fails Foreman moves to a FAILING state until all fragment return > a terminal state. The Web UI will still display FAILED instead of FAILING > > TestDrillbitResilience#failsWhenSendingFragments exposes a limitation to this > approach: if an error occurs when setting up remote fragments we can't assume > they will be able to return a terminal state. In this case the Foreman will > not wait for them to finish and return FAILED to the client immediately. > > > Diffs > ----- > > > exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java > 6656bf6 > > exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java > dd26a76 > > exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java > 716fb66 > > exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryManager.java > 9318233 > exec/java-exec/src/main/resources/rest/profile/list.ftl cf92ede > exec/java-exec/src/main/resources/rest/profile/profile.ftl 46cdc83 > > exec/java-exec/src/test/java/org/apache/drill/exec/server/TestDrillbitResilience.java > ce09f68 > protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java > e76d748 > protocol/src/main/java/org/apache/drill/exec/proto/beans/QueryResult.java > 474e330 > protocol/src/main/protobuf/UserBitShared.proto 0451fd2 > > Diff: https://reviews.apache.org/r/36168/diff/ > > > Testing > ------- > > unit tests are passing along with customer and tpch100 > > > Thanks, > > abdelhakim deneche > >