[ 
https://issues.apache.org/jira/browse/TEZ-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100835#comment-17100835
 ] 

László Bodor edited comment on TEZ-4157 at 5/6/20, 2:11 PM:
------------------------------------------------------------

thanks [~jeagles], these options saved my life...so I started from that strange 
http response header {Content-type: unknown/unknown}

tldr: it seems like a netty bug to me, where http response encoder is reused 
improperly and the workaround is (you can find in  [^TEZ-4157.03.patch] ):
{code}
if (keepAliveParam || connectionKeepAliveEnabled){              
  pipeline.replace(pipeline.get("encoder"), "encoder", new 
HttpResponseEncoder());
}
{code}


deep inside in the pipeline, there is the encoder which works according to its 
internal state:
https://github.com/netty/netty/blob/4.1/codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectEncoder.java#L86

while [writing the second 
response|https://github.com/apache/tez/blob/master/tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java#L1090],
 the same encoder instance is reused (looked at object hashcode) only if 
keepalive is enabled, and its internal state is not ST_INIT (0) for the second 
usage, so it throws that IllegalStateException, and the result somehow silently 
is what you too got in your debug messages, a http response with a totally 
messed up header:
{code}
{Content-type: unknown/unknown}
{code}

with the workaround, it works properly (unfortunately there is no reset() call 
on that encoder)

I want to double-check my workaround and file a netty bug if needed, in the 
meantime could you please take a look at the patch? I mean, I'm about to test 
it on a cluster (with hive), what else do we need in order to make this change 
merged?




was (Author: abstractdog):
thanks [~jeagles], these options saved my life...so I started from that strange 
http response header ({Content-type: unknown/unknown})

tldr: it seems like a netty bug to me, where http response encoder is reused 
improperly and the workaround is (you can find in  [^TEZ-4157.03.patch] ):
{code}
if (keepAliveParam || connectionKeepAliveEnabled){              
  pipeline.replace(pipeline.get("encoder"), "encoder", new 
HttpResponseEncoder());
}
{code}


deep inside in the pipeline, there is the encoder which works according to its 
internal state:
https://github.com/netty/netty/blob/4.1/codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectEncoder.java#L86

while [writing the second 
response|https://github.com/apache/tez/blob/master/tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java#L1090],
 the same encoder instance is reused (looked at object hashcode) only if 
keepalive is enabled, and its internal state is not ST_INIT (0) for the second 
usage, so it throws that IllegalStateException, and the result somehow silently 
is what you too got in your debug messages, a http response with a totally 
messed up header:
{code}
{Content-type: unknown/unknown}
{code}

with the workaround, it works properly (unfortunately there is no reset() call 
on that encoder)

I want to double-check my workaround and file a netty bug if needed, in the 
meantime could you please take a look at the patch? I mean, I'm about to test 
it on a cluster (with hive), what else do we need in order to make this change 
merged?



> ShuffleHandler: upgrade to netty4
> ---------------------------------
>
>                 Key: TEZ-4157
>                 URL: https://issues.apache.org/jira/browse/TEZ-4157
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>         Attachments: TEZ-4157.01.patch, TEZ-4157.02.patch, TEZ-4157.03.patch
>
>
> -In the dependency tree, there are 2 occurrences of compile scope direct 
> netty dependencies, however, they're not used at all. I compiled locally 
> successfully without them. E.g. when investigating blackduck alerts 
> (complaining about netty deps for current 3.10.5.Final), it would be cleaner 
> to start from a dependency tree where Tez doesn't depend on netty directly in 
> order to eliminate its responsibility (and move the focus to underlying 
> hadoop for instance).-
> Tez depends on netty3 almost only in ShuffleHandler and some related classes. 
> We can eliminate netty3 by upgrading it, but this effort might involve some 
> testing due to fundamental [changes from 
> netty3->netty4|https://netty.io/wiki/new-and-noteworthy-in-4.0.html] + we 
> don't have a reference yet, as [hadoop's 
> ShuffleHandler|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java]
>  is still on netty3.
> As per the netty documentation, we can also expect some performance 
> improvement (e.g. Pooled buffers).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to