[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678353#comment-17678353
 ] 

ASF GitHub Bot commented on MAPREDUCE-7431:
-------------------------------------------

hadoop-yetus commented on PR #5311:
URL: https://github.com/apache/hadoop/pull/5311#issuecomment-1387533352

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 53s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 6 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  47m 56s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 25s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   0m 52s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 51s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 22s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   0m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 11s | 
[/results-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-shuffle.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5311/2/artifact/out/results-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-shuffle.txt)
 |  
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle:
 The patch generated 11 new + 8 unchanged - 29 fixed = 19 total (was 37)  |
   | +1 :green_heart: |  mvnsite  |   0m 22s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 17s | 
[/patch-javadoc-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-shuffle-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5311/2/artifact/out/patch-javadoc-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-shuffle-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt)
 |  hadoop-mapreduce-client-shuffle in the patch failed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.  |
   | -1 :x: |  javadoc  |   0m 16s | 
[/patch-javadoc-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-shuffle-jdkPrivateBuild-1.8.0_352-8u352-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5311/2/artifact/out/patch-javadoc-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-shuffle-jdkPrivateBuild-1.8.0_352-8u352-ga-1~20.04-b08.txt)
 |  hadoop-mapreduce-client-shuffle in the patch failed with JDK Private 
Build-1.8.0_352-8u352-ga-1~20.04-b08.  |
   | +1 :green_heart: |  spotbugs  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 22s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 36s |  |  hadoop-mapreduce-client-shuffle 
in the patch passed.  |
   | -1 :x: |  asflicense  |   0m 33s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5311/2/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 110m 18s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5311/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5311 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle |
   | uname | Linux 4edd60006a54 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8fefb83f680483ba0f8efa0094e22ab1d6f8bf8e |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5311/2/testReport/ |
   | Max. process+thread count | 630 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 U: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5311/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> ShuffleHandler is not working correctly in SSL mode after the Netty 4 upgrade
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7431
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7431
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 3.4.0
>            Reporter: Tamas Domok
>            Assignee: Tamas Domok
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: chunked-fileregion.txt, chunked.txt, 
> normal-fileregion.txt, normal.txt, sendMapPipeline.png
>
>
> HADOOP-15327 introduced some regressions in the ShuffleHandler.
> h3. 1. a memory leak
> {code:java}
> ERROR io.netty.util.ResourceLeakDetector: LEAK: ByteBuf.release() was not 
> called before it's garbage-collected. See 
> https://netty.io/wiki/reference-counted-objects.html for more information.
> {code}
>  
> The Shuffle's channelRead didn't release the message properly, the fix would 
> be this:
> {code:java}
>       try {
>         // ....
>       } finally {
>         ReferenceCountUtil.release(msg);
>       }
> {code}
> Or even simpler:
> {code:java}
> extends SimpleChannelInboundHandler<FullHttpRequest>
> {code}
> h3. 1. a bug in SSL mode with more than 1 reducers
> It manifested in multiple errors:
> {code:java}
> ERROR org.apache.hadoop.mapred.ShuffleHandler: Future is unsuccessful. Cause:
> java.io.IOException: Broken pipe
> ERROR org.apache.hadoop.mapred.ShuffleHandler: Future is unsuccessful. Cause:
> java.nio.channels.ClosedChannelException
> // if the reducer memory was not enough, then even this:
> Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#2
>       at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:136)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:377)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
>       at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
>       at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
>       at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:210)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.doShuffle(InMemoryMapOutput.java:91)
> {code}
> *Configuration* - mapred-site.xml
> {code:java}
> mapreduce.shuffle.ssl.enabled=true
> {code}
> Alternative is to build a custom jar where *FadvisedFileRegion* is replaced 
> with *FadvisedChunkedFile* in {*}sendMapOutput{*}.
> *Reproduction*
> {code:java}
> hdfs dfs -rm -r -skipTrash /tmp/sort_input
> hdfs dfs -rm -r -skipTrash /tmp/sort_output
> yarn jar 
> hadoop-3.4.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar
>  randomwriter "-Dmapreduce.randomwriter.totalbytes=10000000000" 
> /tmp/sort_input
> yarn jar 
> hadoop-3.4.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar
>  sort -Dmapreduce.job.reduce.slowstart.completedmaps=1 -r 40 /tmp/sort_input 
> /tmp/sort_output | tee sort_app_output.txt
> {code}
> h3. ShuffleHandler's protocol
> {code:java}
> // HTTP Request
> GET 
> /mapOutput?job=job_1672901779104_0001&reduce=0&map=attempt_1672901779104_0001_m_000003_0,attempt_1672901779104_0001_m_000002_0,attempt_1672901779104_0001_m_000001_0,attempt_1672901779104_0001_m_000000_0,attempt_1672901779104_0001_m_000005_0,attempt_1672901779104_0001_m_000012_0,attempt_1672901779104_0001_m_000009_0,attempt_1672901779104_0001_m_000010_0,attempt_1672901779104_0001_m_000007_0,attempt_1672901779104_0001_m_000011_0,attempt_1672901779104_0001_m_000008_0,attempt_1672901779104_0001_m_000013_0,attempt_1672901779104_0001_m_000014_0,attempt_1672901779104_0001_m_000015_0,attempt_1672901779104_0001_m_000019_0,attempt_1672901779104_0001_m_000018_0,attempt_1672901779104_0001_m_000016_0,attempt_1672901779104_0001_m_000017_0,attempt_1672901779104_0001_m_000020_0,attempt_1672901779104_0001_m_000023_0
>  HTTP/1.1
> + keep alive headers
> // HTTP Response Headers
> content-length=sum(serialised ShuffleHeader in bytes + MapOutput size)
> + keep alive headers
> // Response Data (transfer-encoding=chunked)
> serialised ShuffleHeader
> content of the MapOutput file (start offset - length)
> serialised ShuffleHeader
> content of the MapOutput file (start offset - length)
> serialised ShuffleHeader
> content of the MapOutput file (start offset - length)
> serialised ShuffleHeader
> content of the MapOutput file (start offset - length)
> ...
> LastHttpContent
> // close socket if no keep-alive
> {code}
> h3. Issues
>  - {*}setResponseHeaders{*}: did not always set the the content-length, also 
> the transfer-encoding=chunked header was missing.
>  - {*}ReduceMapFileCount.operationComplete{*}: messed up the futures on the 
> LastHttpContent
>  - {*}ChannelGroup accepted{*}: is only used to close the channels, no need 
> for that magic 5. See the details 
> [here|https://netty.io/4.0/api/io/netty/channel/group/ChannelGroup.html].
>  - {*}bossGroup{*}: should have only 1 thread for accepting connections.
>  - {*}Shuffle{*}: is unnecessarily Sharable, the 3 async sendMap / channel 
> (see below) caused future errors when using FadvisedChunkedFile
> h3. Max session open files is not an optimisation, it's actually wasting 
> resources
> {code:java}
>     // by default maxSessionOpenFiles = 3
>     for (int i = 0; i < Math.min(handlerCtx.maxSessionOpenFiles, 
> mapIds.size()); i++) {
>       ChannelFuture nextMap = sendMap(reduceContext);
>       if(nextMap == null) {
>         return;
>       }
>     }
> {code}
> !sendMapPipeline.png!
> At the end of the day, we create a http chunked stream, there is no need to 
> run 3 sendMap async, the futures will finish one-by-one sequentially. The 
> osCache magic from the FAdvised classes won't happen either, because the 
> first readChunk will be called only later.
> So this can be simplified a lot:
> {code:java}
> sendMap(reduceContext);
> {code}
> h3. My proposal
> Some refactoring: ShuffleHandler is split into multiple classes to make it 
> possible to remove the sharable annotation.
>  - ShuffleChannel
>  - ShuffleChannelInitializer
>  - ShuffleChannelHandlerContext
>  - ShuffleChannelHandler
> TODO:
>  - fix/drop/refactor the existing unit tests
>  - add proper unit test that tests SSL/non-SSL mode where the response data 
> is properly verified
>  - documentation about the protocol
> WIP: 
> [github.com/tomicooler/hadoop|https://github.com/tomicooler/hadoop/commit/3bc027598aea4a3b02a1997fe5d485b9a6e5c41e]
> h3. Netty useful docs
>  * [User guide for 4.x|https://netty.io/wiki/user-guide-for-4.x.html]
>  * [New and noteworthy in 
> 4.0|https://netty.io/wiki/new-and-noteworthy-in-4.0.html]
>  * [Reference counted 
> objects|https://netty.io/wiki/reference-counted-objects.html]   (it will be 
> changed in [Netty 5|https://netty.io/wiki/new-and-noteworthy-in-5.0.html])
>  * HttpStaticFileServer 
> [example|https://github.com/netty/netty/blob/4.1/example/src/main/java/io/netty/example/http/file/HttpStaticFileServerHandler.java]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to