abstractdog commented on code in PR #257:
URL: https://github.com/apache/tez/pull/257#discussion_r1112713675
##########
tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java:
##########
@@ -306,21 +305,24 @@ public ReduceMapFileCount(ReduceContext rc) {
@Override
public void operationComplete(ChannelFuture future) throws Exception {
+ Channel ch = future.channel();
if (!future.isSuccess()) {
- future.channel().close();
+ ch.close();
return;
}
int waitCount = this.reduceContext.getMapsToWait().decrementAndGet();
if (waitCount == 0) {
+ LOG.debug("Finished with all map outputs");
+ ch.writeAndFlush(LastHttpContent.EMPTY_LAST_CONTENT);
Review Comment:
yes, absolutely, this issue is because of the incorrect usage of netty4 APIs
(investigation details are on Jira ticket)
most interestingly, there were no unit tests that showed this issue so far
(added one now), which reproduces when we fetch more inputs in the same
request: due to this issue, the new UT completely hung, and a real TPCDS query
on the cluster became very slow, as composite fetch requests hung and timed out
eventually (didn't cause a query failure, just an extremely slow query)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]