[jira] [Comment Edited] (MAPREDUCE-6923) YARN Shuffle I/O for small partitions

Robert Schmidtke (JIRA) Thu, 03 Aug 2017 05:29:25 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112643#comment-16112643
 ]


Robert Schmidtke edited comment on MAPREDUCE-6923 at 8/3/17 12:28 PM:
----------------------------------------------------------------------

Fyi I have benchmarked another version which uses casts instead of the ternary 
operator using JMH on my Mac:

{code:java}
package de.schmidtke.java.benchmark;

import java.util.Random;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;

public class TernaryBenchmark {

    @State(Scope.Thread)
    public static class TBState {
        private final Random random = new Random(0);
        public long trans;

        @Setup(Level.Invocation)
        public void setup() {
            trans = random.nextLong();
        }
    }

    @Benchmark
    public int testTernary(TBState tbState) {
        long trans = tbState.trans;
        return Math.min(131072,
                trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans);
    }

    @Benchmark
    public int testCast(TBState tbState) {
        long trans = tbState.trans;
        return (int) Math.min((long) 131072, trans);
    }

}
{code}

The results are roughly 1% higher throughput using the cast version, the rest 
seems about the same. I'd go with the ternary operator version for better 
clarity:

{code:none}
Benchmark                                           Mode      Cnt         Score 
       Error  Units
TernaryBenchmark.testCast                          thrpt      200  25142779.388 
± 114863.918  ops/s
TernaryBenchmark.testTernary                       thrpt      200  24829083.072 
±  64009.480  ops/s
TernaryBenchmark.testCast                           avgt      200        ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary                        avgt      200        ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast                         sample  7596374        ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.00          sample                 ≈ 10⁻⁹ 
               s/op
TernaryBenchmark.testCast:testCast·p0.50          sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.90          sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.95          sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.99          sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.999         sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.9999        sample                 ≈ 10⁻⁵ 
               s/op
TernaryBenchmark.testCast:testCast·p1.00          sample                  0.002 
               s/op
TernaryBenchmark.testTernary                      sample  7469568        ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.00    sample                 ≈ 10⁻⁹ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.50    sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.90    sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.95    sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.99    sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.999   sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.9999  sample                 ≈ 10⁻⁵ 
               s/op
TernaryBenchmark.testTernary:testTernary·p1.00    sample                  0.002 
               s/op
TernaryBenchmark.testCast                             ss       10        ≈ 10⁻⁵ 
               s/op
TernaryBenchmark.testTernary                          ss       10        ≈ 10⁻⁵ 
               s/op
{code}



was (Author: rosch):
Fyi I have benchmarked another version which uses casts instead of the ternary 
operator using JMH on my Mac:

{code:java}
package de.schmidtke.java.benchmark;

import java.util.Random;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;

public class TernaryBenchmark {

    @State(Scope.Thread)
    public static class TBState {
        private final Random random = new Random(0);
        public long trans;

        @Setup(Level.Invocation)
        public void setup() {
            trans = random.nextLong();
        }
    }

    @Benchmark
    public int testTernary(TBState tbState) {
        long trans = tbState.trans;
        return Math.min(131072,
                trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans);
    }

    @Benchmark
    public int testCast(TBState tbState) {
        long trans = tbState.trans;
        return (int) Math.min((long) 131072, trans);
    }

}
{code}

The results are roughly 1% higher throughput using the cast version, the rest 
seems about the same. I'd go with the ternary operator version for better 
clarity:

{code:other}
Benchmark                                           Mode      Cnt         Score 
       Error  Units
TernaryBenchmark.testCast                          thrpt      200  25142779.388 
± 114863.918  ops/s
TernaryBenchmark.testTernary                       thrpt      200  24829083.072 
±  64009.480  ops/s
TernaryBenchmark.testCast                           avgt      200        ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary                        avgt      200        ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast                         sample  7596374        ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.00          sample                 ≈ 10⁻⁹ 
               s/op
TernaryBenchmark.testCast:testCast·p0.50          sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.90          sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.95          sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.99          sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.999         sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testCast:testCast·p0.9999        sample                 ≈ 10⁻⁵ 
               s/op
TernaryBenchmark.testCast:testCast·p1.00          sample                  0.002 
               s/op
TernaryBenchmark.testTernary                      sample  7469568        ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.00    sample                 ≈ 10⁻⁹ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.50    sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.90    sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.95    sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.99    sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.999   sample                 ≈ 10⁻⁷ 
               s/op
TernaryBenchmark.testTernary:testTernary·p0.9999  sample                 ≈ 10⁻⁵ 
               s/op
TernaryBenchmark.testTernary:testTernary·p1.00    sample                  0.002 
               s/op
TernaryBenchmark.testCast                             ss       10        ≈ 10⁻⁵ 
               s/op
TernaryBenchmark.testTernary                          ss       10        ≈ 10⁻⁵ 
               s/op
{code}


> YARN Shuffle I/O for small partitions
> -------------------------------------
>
>                 Key: MAPREDUCE-6923
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>         Environment: Observed in Hadoop 2.7.3 and above (judging from the 
> source code of future versions), and Ubuntu 16.04.
>            Reporter: Robert Schmidtke
>            Assignee: Robert Schmidtke
>         Attachments: MAPREDUCE-6923.00.patch
>
>
> When a job configuration results in small partitions read by each reducer 
> from each mapper (e.g. 65 kilobytes as in my setup: a 
> [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java]
>  of 256 gigabytes using 2048 mappers and reducers each), and setting
> {code:xml}
> <property>
>   <name>mapreduce.shuffle.transferTo.allowed</name>
>   <value>false</value>
> </property>
> {code}
> then the default setting of
> {code:xml}
> <property>
>   <name>mapreduce.shuffle.transfer.buffer.size</name>
>   <value>131072</value>
> </property>
> {code}
> results in almost 100% overhead in reads during shuffle in YARN, because for 
> each 65K needed, 128K are read.
> I propose a fix in 
> [FadvisedFileRegion.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L114]
>  as follows:
> {code:java}
> ByteBuffer byteBuffer = ByteBuffer.allocate(Math.min(this.shuffleBufferSize, 
> trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans));
> {code}
> e.g. 
> [here|https://github.com/apache/hadoop/compare/branch-2.7.3...robert-schmidtke:adaptive-shuffle-buffer].
>  This sets the shuffle buffer size to the minimum value of the shuffle buffer 
> size specified in the configuration (128K by default), and the actual 
> partition size (65K on average in my setup). In my benchmarks this reduced 
> the read overhead in YARN from about 100% (255 additional gigabytes as 
> described above) down to about 18% (an additional 45 gigabytes). The runtime 
> of the job remained the same in my setup.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (MAPREDUCE-6923) YARN Shuffle I/O for small partitions

Reply via email to