[ https://issues.apache.org/jira/browse/TEZ-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098937#comment-17098937 ]
László Bodor edited comment on TEZ-4157 at 5/5/20, 1:25 PM: ------------------------------------------------------------ [^TEZ-4157.02.patch] is about the first successful refactor to netty4, most of the unit tests pass, except testKeepAlive, which has started to drive me crazy, but I'll give another chance to it [~jeagles]: do you have some pointers regarding testKeepAlive, maybe you're familiar with that testcase...I'm 99% sure that my netty upgrade is correct in [^TEZ-4157.02.patch], and all of the test cases pass (except testKeepAlive)...in testKeepAlive, there are 2 consecutive keepalive connections from the client, and the [second|https://github.com/apache/tez/blob/master/tez-plugins/tez-aux-services/src/test/java/org/apache/tez/auxservices/TestShuffleHandler.java#L474] fails with invalid http response after my patch... could you please clarify the expected behavior of this test case, [regarding broken pipe|https://github.com/apache/tez/blob/master/tez-plugins/tez-aux-services/src/test/java/org/apache/tez/auxservices/TestShuffleHandler.java#L403]? I've been playing with this test case for more than 8-10 hours, but I haven't been able to solve it...basically: 1. if I insert a Thread.sleep(1000) before the second getInputStream, the connection is successful, but it than it fails because the second socket address is not the same, so I think it's not a keepalive anymore 2. without the sleep, I got invalid http response no matter how I change the payload from the fake shuffle handler... what's exacly the point of this very [long cycle and big payload|https://github.com/apache/tez/blob/master/tez-plugins/tez-aux-services/src/test/java/org/apache/tez/auxservices/TestShuffleHandler.java#L410]? do we expect the buffer fill cycle itself to take longer than the keepalive timeout? cc: [~rizhang] was (Author: abstractdog): [^TEZ-4157.02.patch] is about the first successful refactor to netty4, most of the unit tests pass, except testKeepAlive, which has started to drive me crazy, but I'll give another chance to it > ShuffleHandler: upgrade to netty4 > --------------------------------- > > Key: TEZ-4157 > URL: https://issues.apache.org/jira/browse/TEZ-4157 > Project: Apache Tez > Issue Type: Bug > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Attachments: TEZ-4157.01.patch, TEZ-4157.02.patch > > > -In the dependency tree, there are 2 occurrences of compile scope direct > netty dependencies, however, they're not used at all. I compiled locally > successfully without them. E.g. when investigating blackduck alerts > (complaining about netty deps for current 3.10.5.Final), it would be cleaner > to start from a dependency tree where Tez doesn't depend on netty directly in > order to eliminate its responsibility (and move the focus to underlying > hadoop for instance).- > Tez depends on netty3 almost only in ShuffleHandler and some related classes. > We can eliminate netty3 by upgrading it, but this effort might involve some > testing due to fundamental [changes from > netty3->netty4|https://netty.io/wiki/new-and-noteworthy-in-4.0.html] + we > don't have a reference yet, as [hadoop's > ShuffleHandler|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java] > is still on netty3. > As per the netty documentation, we can also expect some performance > improvement (e.g. Pooled buffers). -- This message was sent by Atlassian Jira (v8.3.4#803005)