[
https://issues.apache.org/jira/browse/MAPREDUCE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050369#comment-15050369
]
Alan Burlison commented on MAPREDUCE-6417:
------------------------------------------
Yes I have. Except on very small sizes (less than around 8 bytes) primitives.h
is slower. And it really doesn't matter even if it is faster if it is
incorrect, which it is.
> MapReduceClient's primitives.h is toxic and should be extirpated
> ----------------------------------------------------------------
>
> Key: MAPREDUCE-6417
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6417
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Components: client
> Affects Versions: 3.0.0
> Reporter: Alan Burlison
> Assignee: Alan Burlison
> Priority: Blocker
> Attachments: MAPREDUCE-6417.001.patch
>
>
> MapReduceClient's primitives.h attempts to provide optimised versions of
> standard library memory copy and comparison functions. It has been the
> subject of several portability-related bugs:
> * HADOOP-11505 hadoop-mapreduce-client-nativetask uses bswap where be32toh is
> needed, doesn't work on non-x86
> * HADOOP-11665 Provide and unify cross platform byteorder support in native
> code
> * MAPREDUCE-6397 MAPREDUCE makes many endian-dependent assumptions
> * HADOOP-11484 hadoop-mapreduce-client-nativetask fails to build on ARM
> AARCH64 due to x86 asm statements
> At present it only works on x86 and ARM64 as it lacks definitions for bswap
> and bswap64 for any platforms other than those.
> However it has even more serious problems on non-x86 architectures, for
> example on SPARC simple_memcpy simply doesn't work at all:
> {code}
> $ cat bang.cc
> #include <string.h>
> #define SIMPLE_MEMCPY
> #include "primitives.h"
> int main(int argc, char **argv)
> {
> char b1[9];
> char b2[9];
> simple_memcpy(b2, b1, sizeof(b1));
> }
> $ gcc -o bang bang.cc && ./bang
> Bus Error (core dumped)
> {code}
> That's because simple_memcpy does pointer fiddling that results in misaligned
> accesses, which are illegal on SPARC.
> fmemcmp is also broken. Even if a definition of bswap is provided, on
> big-endian architectures the result is simply wrong because of its
> unconditional use of bswap:
> {code}
> $ cat thud.cc
> #include <stdio.h>
> #include <strings.h>
> #include "primitives.h"
> int main(int argc, char **argv)
> {
> char a[] = { 0,1,2,0 };
> char b[] = { 0,2,1,0 };
> printf("%lld %d\n", fmemcmp(a, b, sizeof(a), memcmp(a, b, sizeof(a))));
> }
> $ g++ -o thud thud.cc && ./thud
> 65280 -1
> {code}
> And in addition fmemcmp suffers from the same misalignment issues as
> simple_memcpy and coredumps on SPARC when asked to compare odd-sized buffers.
> primitives.h provides the following functions:
> * bswap - used in 12 files in MRC but as HADOOP-11505 points out, mostly
> incorrectly as it takes no account of platform endianness
> * bswap64 - used in 4 files in MRC, same comments as per bswap apply
> * simple_memcpy - used in 3 files in MRC, should be replaced with the
> standard memcpy
> * fmemcmp - used in 1 file, should be replaced with the standard memcmp
> * fmemeq - used in 1 file, should be replaced with the standard memcmp
> * frmemeq - not used at all, should just be removed
> *Summary*: primitives.h should simply be deleted and replaced with the
> standard memory copy & compare functions, or with thin wrappers around them
> where the APIs are different.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)