[
https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112643#comment-16112643
]
Robert Schmidtke edited comment on MAPREDUCE-6923 at 8/3/17 12:28 PM:
----------------------------------------------------------------------
Fyi I have benchmarked another version which uses casts instead of the ternary
operator using JMH on my Mac:
{code:java}
package de.schmidtke.java.benchmark;
import java.util.Random;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
public class TernaryBenchmark {
@State(Scope.Thread)
public static class TBState {
private final Random random = new Random(0);
public long trans;
@Setup(Level.Invocation)
public void setup() {
trans = random.nextLong();
}
}
@Benchmark
public int testTernary(TBState tbState) {
long trans = tbState.trans;
return Math.min(131072,
trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans);
}
@Benchmark
public int testCast(TBState tbState) {
long trans = tbState.trans;
return (int) Math.min((long) 131072, trans);
}
}
{code}
The results are roughly 1% higher throughput using the cast version, the rest
seems about the same. I'd go with the ternary operator version for better
clarity:
{code:none}
Benchmark Mode Cnt Score
Error Units
TernaryBenchmark.testCast thrpt 200 25142779.388
± 114863.918 ops/s
TernaryBenchmark.testTernary thrpt 200 24829083.072
± 64009.480 ops/s
TernaryBenchmark.testCast avgt 200 ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary avgt 200 ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast sample 7596374 ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.00 sample ≈ 10⁻⁹
s/op
TernaryBenchmark.testCast:testCast·p0.50 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.90 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.95 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.99 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.999 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.9999 sample ≈ 10⁻⁵
s/op
TernaryBenchmark.testCast:testCast·p1.00 sample 0.002
s/op
TernaryBenchmark.testTernary sample 7469568 ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.00 sample ≈ 10⁻⁹
s/op
TernaryBenchmark.testTernary:testTernary·p0.50 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.90 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.95 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.99 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.999 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.9999 sample ≈ 10⁻⁵
s/op
TernaryBenchmark.testTernary:testTernary·p1.00 sample 0.002
s/op
TernaryBenchmark.testCast ss 10 ≈ 10⁻⁵
s/op
TernaryBenchmark.testTernary ss 10 ≈ 10⁻⁵
s/op
{code}
was (Author: rosch):
Fyi I have benchmarked another version which uses casts instead of the ternary
operator using JMH on my Mac:
{code:java}
package de.schmidtke.java.benchmark;
import java.util.Random;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
public class TernaryBenchmark {
@State(Scope.Thread)
public static class TBState {
private final Random random = new Random(0);
public long trans;
@Setup(Level.Invocation)
public void setup() {
trans = random.nextLong();
}
}
@Benchmark
public int testTernary(TBState tbState) {
long trans = tbState.trans;
return Math.min(131072,
trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans);
}
@Benchmark
public int testCast(TBState tbState) {
long trans = tbState.trans;
return (int) Math.min((long) 131072, trans);
}
}
{code}
The results are roughly 1% higher throughput using the cast version, the rest
seems about the same. I'd go with the ternary operator version for better
clarity:
{code:other}
Benchmark Mode Cnt Score
Error Units
TernaryBenchmark.testCast thrpt 200 25142779.388
± 114863.918 ops/s
TernaryBenchmark.testTernary thrpt 200 24829083.072
± 64009.480 ops/s
TernaryBenchmark.testCast avgt 200 ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary avgt 200 ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast sample 7596374 ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.00 sample ≈ 10⁻⁹
s/op
TernaryBenchmark.testCast:testCast·p0.50 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.90 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.95 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.99 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.999 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testCast:testCast·p0.9999 sample ≈ 10⁻⁵
s/op
TernaryBenchmark.testCast:testCast·p1.00 sample 0.002
s/op
TernaryBenchmark.testTernary sample 7469568 ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.00 sample ≈ 10⁻⁹
s/op
TernaryBenchmark.testTernary:testTernary·p0.50 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.90 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.95 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.99 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.999 sample ≈ 10⁻⁷
s/op
TernaryBenchmark.testTernary:testTernary·p0.9999 sample ≈ 10⁻⁵
s/op
TernaryBenchmark.testTernary:testTernary·p1.00 sample 0.002
s/op
TernaryBenchmark.testCast ss 10 ≈ 10⁻⁵
s/op
TernaryBenchmark.testTernary ss 10 ≈ 10⁻⁵
s/op
{code}
> YARN Shuffle I/O for small partitions
> -------------------------------------
>
> Key: MAPREDUCE-6923
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Environment: Observed in Hadoop 2.7.3 and above (judging from the
> source code of future versions), and Ubuntu 16.04.
> Reporter: Robert Schmidtke
> Assignee: Robert Schmidtke
> Attachments: MAPREDUCE-6923.00.patch
>
>
> When a job configuration results in small partitions read by each reducer
> from each mapper (e.g. 65 kilobytes as in my setup: a
> [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java]
> of 256 gigabytes using 2048 mappers and reducers each), and setting
> {code:xml}
> <property>
> <name>mapreduce.shuffle.transferTo.allowed</name>
> <value>false</value>
> </property>
> {code}
> then the default setting of
> {code:xml}
> <property>
> <name>mapreduce.shuffle.transfer.buffer.size</name>
> <value>131072</value>
> </property>
> {code}
> results in almost 100% overhead in reads during shuffle in YARN, because for
> each 65K needed, 128K are read.
> I propose a fix in
> [FadvisedFileRegion.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L114]
> as follows:
> {code:java}
> ByteBuffer byteBuffer = ByteBuffer.allocate(Math.min(this.shuffleBufferSize,
> trans > Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans));
> {code}
> e.g.
> [here|https://github.com/apache/hadoop/compare/branch-2.7.3...robert-schmidtke:adaptive-shuffle-buffer].
> This sets the shuffle buffer size to the minimum value of the shuffle buffer
> size specified in the configuration (128K by default), and the actual
> partition size (65K on average in my setup). In my benchmarks this reduced
> the read overhead in YARN from about 100% (255 additional gigabytes as
> described above) down to about 18% (an additional 45 gigabytes). The runtime
> of the job remained the same in my setup.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]