[
https://issues.apache.org/jira/browse/HADOOP-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HADOOP-6166:
--------------------------------
Attachment: Rplots-nehalem64.pdf
Rplots-laptop.pdf
Rplots-nehalem32.pdf
Looks like the benchmark has run long enough to get good data. Here are the
benchmarks from TestPureJavaCrc32 run on three different test systems.
nehalem32 is the same nehalem box (3MB L2 cache) running a 32-bit JVM.
nehalem64 is that box with a 64-bit JVM. "laptop" is my MacBook Pro (Core 2
duo) running a 64-bit JVM.
Each PDF has several pages:
- The first graph shows performance over the whole byte range tested. You'll
definitely have to zoom in to be able to see anything here, and even then it's
not that useful.
- The remaining graphs show the different algorithms' performance on
different sizes (same as the tables people have been pasting into JIRA)
I ran the whole benchmark suite 50+ times to generate the error bars. Hopefully
they'll serve as a good visual indicator for where the differences are actually
statistically significant.
In summary, here's how I interpret the data:
- For the 4-byte case, PureJavaCrc32 wins out on my laptop and the 32-bit JVM
by a strong margin. On the 64-bit JVM it's within 5-10% of the rest (very
little difference)
- The 8-byte case is interesting - all of the 16_16* CRCs perform worse then
the _8_8 CRCs. On the 32-bit JDK it's especially obvious (nearly a factor of
two)
- The 512-byte case (probably most common for DFS) - everyone is pretty much
neck and neck. The 8_8d implementation wins significantly on nehalem64, and the
8_8b wins significantly on nehalem32. On my laptop they're all within the error
bars except for 16_16 which is significantly worse
- The 16MB case is the same as the 512-byte case, just more pronounced. 8_8d
wins on 64-bit, 8_8b on 32-bit, both by about 10%.
So, I think the next step here is to profile a couple of MR applications to see
what sizes are most common.
My personal opinion is that we should target the 64-bit Nehalem architecture
and the 128-byte size range. This would point to the 8_8d implementation as the
winner.
> Improve PureJavaCrc32
> ---------------------
>
> Key: HADOOP-6166
> URL: https://issues.apache.org/jira/browse/HADOOP-6166
> Project: Hadoop Common
> Issue Type: Improvement
> Components: util
> Reporter: Tsz Wo (Nicholas), SZE
> Assignee: Tsz Wo (Nicholas), SZE
> Attachments: c6166_20090722.patch, c6166_20090722_benchmark_32VM.txt,
> c6166_20090722_benchmark_64VM.txt, c6166_20090727.patch,
> c6166_20090728.patch, c6166_20090810.patch, c6166_20090811.patch, graph.r,
> graph.r, Rplots-laptop.pdf, Rplots-nehalem32.pdf, Rplots-nehalem64.pdf,
> Rplots.pdf, Rplots.pdf, Rplots.pdf
>
>
> Got some ideas to improve CRC32 calculation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.