Re: Fixes for GCC 4.8 and above with -O3

Jeffrey Walton Mon, 13 Jul 2015 05:03:42 -0700

>     template <class T, bool STRICT>
>        inline unsigned int GetAlignmentOf(T *dummy=NULL)


It might even be better to have two sets of functions:

    // Retrieve the un-doctored requirement
    unsigned int GetStrictAlignmentOf(...)

    // Retrieve doctored requirement, influenced by 
CRYPTOPP_ALLOW_UNALIGNED_ACCESS
    unsigned int GetEffectiveAlignmentOf(...)

*****

I also think we should add CRYPTOPP_NO_UNALIGNED_DATA_ACCESS to config.h, 
enable it by default, and guard it with 
CRYPTOPP_MAINTAIN_BACKWARDS_COMPATIBILITY. That is, if 
CRYPTOPP_MAINTAIN_BACKWARDS_COMPATIBILITY is defined, then don't use 
CRYPTOPP_NO_UNALIGNED_DATA_ACCESS.

Otherwise, we are enabling undefined behavior by default. I think that's a 
bad choice.

Jeff

On Monday, July 13, 2015 at 3:29:30 AM UTC-4, Jeffrey Walton wrote:
>
> OK, so I had more time to look at the issue mentioned by Uri.
>
> Below and attached is the patch I am proposing that fixes the issue by 
> removing some of the undefined behavior that caused the problem.
>
> Comments and objections, please.
>
> **********
>
> It appears there's a bad interaction between 
> CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS, IsAligned(), IsAlignedOn(), 
> GetAlignmentOf(), OptimalDataAlignment() and the ability to use unaligned 
> data on i386 and x86_64. Since i386 and x86_64 allow unaligned data, we 
> effectively have this goodness:
>
> template <class T>
> inline unsigned int GetAlignmentOf(T *dummy=NULL)    // VC60 workaround
> {
> #ifdef CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
>     if (sizeof(T) < 16)
>         return 1;
> #endif
>
> #if (_MSC_VER >= 1300)
>     return __alignof(T);
> #elif defined(__GNUC__)
>     return __alignof__(T);
> #elif CRYPTOPP_BOOL_SLOW_WORD64
>     return UnsignedMin(4U, sizeof(T));
> #else
>     return sizeof(T);
> #endif
> }
>
> However, the logic does not account for the compiler generating SSE 
> instructions. And we can only guess when that's going to happen....
>
> So, to fix this issue for now, we have the patch below. First, it makes 
> available CRYPTOPP_NO_UNALIGNED_DATA_ACCESS. This unconditionally stops the 
> problem. It also helps on other platforms, like early ARM and embedded 
> devices.
>
> Second, it changes GetAlignmentOf to the follow. We make strict=true when 
> we know vectorization is occurring.
>
>     template <class T, bool STRICT>
>         inline unsigned int GetAlignmentOf(T *dummy=NULL)
>
> With GetAlignmentOf fixed, IsAligned and IsAlignedOn falls into place:
>
>     template <class T, bool STRICT>
>         inline bool IsAligned(const void *p, T *dummy=NULL) {
>             return IsAlignedOn(p, GetAlignmentOf<T, STRICT>());
>         }
>
> Third, we just need to fix the problems. We know SHA-3 is a problem due to 
> misc.cpp and xorbuf. So we make strict=true in them. Otherwise, we preserve 
> existing behavior and set strict=false.
>
> diff --git a/GNUmakefile b/GNUmakefile
> index ca36725..045203b 100644
> --- a/GNUmakefile
> +++ b/GNUmakefile
> @@ -1,6 +1,6 @@
>  CXXFLAGS ?= -DNDEBUG
>  SYMBOLS ?= -g2
> -OPTIMIZE ?= -O2
> +OPTIMIZE ?= -O3
>  # -fPIC is supported, and enabled by default for x86_64.
>  # the following options reduce code size, but breaks link or makes link 
> very slow on some systems
>  # CXXFLAGS += -ffunction-sections -fdata-sections
> @@ -44,6 +44,11 @@ endif
>  # Merge symbols and optimizations
>  CXXFLAGS += $(SYMBOLS) $(OPTIMIZE)
>  
> +# GCC 4.8 and above vectorizes some operations, which creates alignment 
> problems
> +# ifeq ($(filter $(CXXFLAGS),-O3),-O3)
> +# CXXFLAGS += -DCRYPTOPP_NO_UNALIGNED_DATA_ACCESS
> +# endif
> +
>  ifeq ($(IS_X86),1)
>  
>  GCC42_OR_LATER = $(shell $(CXX) -v 2>&1 | $(EGREP) -c "^gcc version 
> (4.[2-9]|[5-9])")
> diff --git a/config.h b/config.h
> index 7472ca0..21f6d4f 100644
> --- a/config.h
> +++ b/config.h
> @@ -342,7 +342,7 @@ NAMESPACE_END
>      #define CRYPTOPP_BOOL_X86 0
>  #endif
>  
> -#if CRYPTOPP_BOOL_X64 || CRYPTOPP_BOOL_X86 || defined(__powerpc__)
> +#if !defined(CRYPTOPP_NO_UNALIGNED_DATA_ACCESS) && (CRYPTOPP_BOOL_X64 || 
> CRYPTOPP_BOOL_X86 || defined(__powerpc__))
>      #define CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
>  #endif
>  
> diff --git a/crc.cpp b/crc.cpp
> index 10c25c2..3b73c53 100644
> --- a/crc.cpp
> +++ b/crc.cpp
> @@ -126,7 +126,7 @@ void CRC32::Update(const byte *s, size_t n)
>  {
>      word32 crc = m_crc;
>  
> -    for(; !IsAligned<word32>(s) && n > 0; n--)
> +    for(; !IsAligned<word32, false>(s) && n > 0; n--)
>          crc = m_tab[CRC32_INDEX(crc) ^ *s++] ^ CRC32_SHIFTED(crc);
>  
>      while (n >= 4)
> diff --git a/cryptlib.cpp b/cryptlib.cpp
> index a9ed290..6683129 100644
> --- a/cryptlib.cpp
> +++ b/cryptlib.cpp
> @@ -183,17 +183,17 @@ size_t 
> BlockTransformation::AdvancedProcessBlocks(const byte *inBlocks, const by
>  
>  unsigned int BlockTransformation::OptimalDataAlignment() const
>  {
> -    return GetAlignmentOf<word32>();
> +    return GetAlignmentOf<word32, false>();
>  }
>  
>  unsigned int StreamTransformation::OptimalDataAlignment() const
>  {
> -    return GetAlignmentOf<word32>();
> +    return GetAlignmentOf<word32, false>();
>  }
>  
>  unsigned int HashTransformation::OptimalDataAlignment() const
>  {
> -    return GetAlignmentOf<word32>();
> +    return GetAlignmentOf<word32, false>();
>  }
>  
>  void StreamTransformation::ProcessLastBlock(byte *outString, const byte 
> *inString, size_t length)
> diff --git a/iterhash.cpp b/iterhash.cpp
> index 1e31e9f..e9a812f 100644
> --- a/iterhash.cpp
> +++ b/iterhash.cpp
> @@ -50,7 +50,7 @@ template <class T, class BASE> void IteratedHashBase<T, 
> BASE>::Update(const byte
>              HashBlock(dataBuf);
>              return;
>          }
> -        else if (IsAligned<T>(input))
> +        else if (IsAligned<T, false>(input))
>          {
>              size_t leftOver = HashMultipleBlocks((T *)input, len);
>              input += (len - leftOver);
> @@ -138,7 +138,7 @@ template <class T, class BASE> void 
> IteratedHashBase<T, BASE>::TruncatedFinal(by
>  
>      HashBlock(dataBuf);
>  
> -    if (IsAligned<HashWordType>(digest) && size%sizeof(HashWordType)==0)
> +    if (IsAligned<HashWordType, false>(digest) && 
> size%sizeof(HashWordType)==0)
>          ConditionalByteReverse<HashWordType>(order, (HashWordType 
> *)digest, stateBuf, size);
>      else
>      {
> diff --git a/iterhash.h b/iterhash.h
> index cce9e82..e83afec 100644
> --- a/iterhash.h
> +++ b/iterhash.h
> @@ -25,7 +25,7 @@ public:
>  
>      IteratedHashBase() : m_countLo(0), m_countHi(0) {}
>      unsigned int OptimalBlockSize() const {return this->BlockSize();}
> -    unsigned int OptimalDataAlignment() const {return 
> GetAlignmentOf<T>();}
> +    unsigned int OptimalDataAlignment() const {return GetAlignmentOf<T, 
> false>();}
>      void Update(const byte *input, size_t length);
>      byte * CreateUpdateSpace(size_t &size);
>      void Restart();
> diff --git a/misc.cpp b/misc.cpp
> index 3c2c2a5..908578b 100644
> --- a/misc.cpp
> +++ b/misc.cpp
> @@ -14,14 +14,22 @@
>  
>  NAMESPACE_BEGIN(CryptoPP)
>  
> +// GCC 4.7, 4.9 and 5.1 generates code using vmovdqa instructions under 
> -O3, which cause a segfault due to 128-bit word alignment requirements.
> +// Refer to http://stackoverflow.com/q/31373765 and 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=66852 for more details.
> +// #pragma GCC push_options
> +// #pragma GCC optimize ("-O2")
> +
>  void xorbuf(byte *buf, const byte *mask, size_t count)
>  {
>      size_t i;
>  
> -    if (IsAligned<word32>(buf) && IsAligned<word32>(mask))
> +    if (IsAligned<word32, false>(buf) && IsAligned<word32, false>(mask))
>      {
> -        if (!CRYPTOPP_BOOL_SLOW_WORD64 && IsAligned<word64>(buf) && 
> IsAligned<word64>(mask))
> +        // GCC 4.8 and above vectorizes the XOR at -O3, so the data must 
> be aligned.
> +        if (!CRYPTOPP_BOOL_SLOW_WORD64 && IsAligned<word64, true>(buf) && 
> IsAligned<word64, true>(mask))
>          {
> +            assert(IsAlignedOn(buf, 8)); assert(IsAlignedOn(mask, 8));
> +
>              for (i=0; i<count/8; i++)
>                  ((word64*)buf)[i] ^= ((word64*)mask)[i];
>              count -= 8*i;
> @@ -44,14 +52,18 @@ void xorbuf(byte *buf, const byte *mask, size_t count)
>          buf[i] ^= mask[i];
>  }
>  
> +// #pragma GCC pop_options
> +
>  void xorbuf(byte *output, const byte *input, const byte *mask, size_t 
> count)
>  {
>      size_t i;
>  
> -    if (IsAligned<word32>(output) && IsAligned<word32>(input) && 
> IsAligned<word32>(mask))
> +    if (IsAligned<word32, false>(output) && IsAligned<word32, 
> false>(input) && IsAligned<word32, false>(mask))
>      {
> -        if (!CRYPTOPP_BOOL_SLOW_WORD64 && IsAligned<word64>(output) && 
> IsAligned<word64>(input) && IsAligned<word64>(mask))
> +        // GCC 4.8 and above vectorizes the XOR at -O3, so the data must 
> be aligned.
> +        if (!CRYPTOPP_BOOL_SLOW_WORD64 && IsAligned<word64, true>(output) 
> && IsAligned<word64, true>(input) && IsAligned<word64, true>(mask))
>          {
> +            assert(IsAlignedOn(output, 8)); assert(IsAlignedOn(input, 
> 8)); assert(IsAlignedOn(input, 8));
>              for (i=0; i<count/8; i++)
>                  ((word64*)output)[i] = ((word64*)input)[i] ^ 
> ((word64*)mask)[i];
>              count -= 8*i;
> @@ -81,11 +93,14 @@ bool VerifyBufsEqual(const byte *buf, const byte 
> *mask, size_t count)
>      size_t i;
>      byte acc8 = 0;
>  
> -    if (IsAligned<word32>(buf) && IsAligned<word32>(mask))
> +    if (IsAligned<word32, false>(buf) && IsAligned<word32, false>(mask))
>      {
>          word32 acc32 = 0;
> -        if (!CRYPTOPP_BOOL_SLOW_WORD64 && IsAligned<word64>(buf) && 
> IsAligned<word64>(mask))
> +        // GCC 4.8 and above could vectorize the XOR at -O3, so the data 
> must be aligned.
> +        if (!CRYPTOPP_BOOL_SLOW_WORD64 && IsAligned<word64, true>(buf) && 
> IsAligned<word64, true>(mask))
>          {
> +            assert(IsAlignedOn(buf, 8)); assert(IsAlignedOn(mask, 8));
> +
>              word64 acc64 = 0;
>              for (i=0; i<count/8; i++)
>                  acc64 |= ((word64*)buf)[i] ^ ((word64*)mask)[i];
> diff --git a/misc.h b/misc.h
> index 9b25ee5..9293490 100644
> --- a/misc.h
> +++ b/misc.h
> @@ -46,7 +46,7 @@
>  #define GCC_DIAGNOSTIC_AWARE ((__GNUC__ > 4 || (__GNUC__ == 4 && 
> __GNUC_MINOR__ >= 7)) || defined(__clang__))
>  
>  // Used to manage function-level optimizations when working around 
> compiler issues.
> -//   At -O3, GCC vectorizes and uses SSE instructions, even if alignment 
> does not meet instruction requirements.
> +//   At -O3, GCC vectorizes uses SSE instructions, even if alignment does 
> not meet instruction requirements.
>  #define GCC_OPTIMIZE_AWARE ((__GNUC__ > 4 || (__GNUC__ == 4 && 
> __GNUC_MINOR__ >= 7)) || defined(__clang__))
>  
>  #if GCC_DIAGNOSTIC_AWARE
> @@ -387,11 +387,14 @@ inline T1 RoundUpToMultipleOf(const T1 &n, const T2 
> &m)
>      return RoundDownToMultipleOf(n+m-1, m);
>  }
>  
> -template <class T>
> +// Set STRICT=true when the compiler vectorizes the data. This is hit or 
> miss,
> +//   but we know it happens with GCC 4.8 and above with -O3 in misc.cpp, 
> xorbuf().
> +//   We can't provide STRICT=false by default because its a C++11 
> extension.
> +template <class T, bool STRICT>
>  inline unsigned int GetAlignmentOf(T *dummy=NULL)    // VC60 workaround
>  {
>  #ifdef CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
> -    if (sizeof(T) < 16)
> +    if (!STRICT && sizeof(T) < 16)
>          return 1;
>  #endif
>  
> @@ -411,10 +414,13 @@ inline bool IsAlignedOn(const void *p, unsigned int 
> alignment)
>      return alignment==1 || (IsPowerOf2(alignment) ? 
> ModPowerOf2((size_t)p, alignment) == 0 : (size_t)p % alignment == 0);
>  }
>  
> -template <class T>
> +// Set STRICT=true when the compiler vectorizes the data. This is hit or 
> miss,
> +//   but we know it happens with GCC 4.8 and above with -O3 in misc.cpp, 
> xorbuf().
> +//   We can't provide STRICT=false by default because its a C++11 
> extension.
> +template <class T, bool STRICT>
>  inline bool IsAligned(const void *p, T *dummy=NULL)    // VC60 workaround
>  {
> -    return IsAlignedOn(p, GetAlignmentOf<T>());
> +    return IsAlignedOn(p, GetAlignmentOf<T, STRICT>());
>  }
>  
>  #ifdef IS_LITTLE_ENDIAN
> @@ -562,11 +568,11 @@ template<> inline void SecureWipeBuffer(word64 *buf, 
> size_t n)
>  template <class T>
>  inline void SecureWipeArray(T *buf, size_t n)
>  {
> -    if (sizeof(T) % 8 == 0 && GetAlignmentOf<T>() % 
> GetAlignmentOf<word64>() == 0)
> +    if (sizeof(T) % 8 == 0 && GetAlignmentOf<T, false>() % 
> GetAlignmentOf<word64, false>() == 0)
>          SecureWipeBuffer((word64 *)buf, n * (sizeof(T)/8));
> -    else if (sizeof(T) % 4 == 0 && GetAlignmentOf<T>() % 
> GetAlignmentOf<word32>() == 0)
> +    else if (sizeof(T) % 4 == 0 && GetAlignmentOf<T, false>() % 
> GetAlignmentOf<word32, false>() == 0)
>          SecureWipeBuffer((word32 *)buf, n * (sizeof(T)/4));
> -    else if (sizeof(T) % 2 == 0 && GetAlignmentOf<T>() % 
> GetAlignmentOf<word16>() == 0)
> +    else if (sizeof(T) % 2 == 0 && GetAlignmentOf<T, false>() % 
> GetAlignmentOf<word16, false>() == 0)
>          SecureWipeBuffer((word16 *)buf, n * (sizeof(T)/2));
>      else
>          SecureWipeBuffer((byte *)buf, n * sizeof(T));
> diff --git a/panama.cpp b/panama.cpp
> index 40c619f..14cc4bf 100644
> --- a/panama.cpp
> +++ b/panama.cpp
> @@ -451,7 +451,7 @@ void PanamaCipherPolicy<B>::CipherResynchronize(byte 
> *keystreamBuffer, const byt
>      assert(length==32);
>      this->Reset();
>      this->Iterate(1, m_key);
> -    if (iv && IsAligned<word32>(iv))
> +    if (iv && IsAligned<word32, false>(iv))
>          this->Iterate(1, (const word32 *)iv);
>      else
>      {
> diff --git a/rc2.h b/rc2.h
> index ebff798..dd78ef3 100644
> --- a/rc2.h
> +++ b/rc2.h
> @@ -25,7 +25,7 @@ class RC2 : public RC2_Info, public 
> BlockCipherDocumentation
>      {
>      public:
>          void UncheckedSetKey(const byte *userKey, unsigned int length, 
> const NameValuePairs &params);
> -        unsigned int OptimalDataAlignment() const {return 
> GetAlignmentOf<word16>();}
> +        unsigned int OptimalDataAlignment() const {return 
> GetAlignmentOf<word16, false>();}
>  
>      protected:
>          FixedSizeSecBlock<word16, 64> K;  // expanded key table
> diff --git a/salsa.cpp b/salsa.cpp
> index aefe74c..b6dab21 100644
> --- a/salsa.cpp
> +++ b/salsa.cpp
> @@ -60,7 +60,7 @@ unsigned int Salsa20_Policy::GetAlignment() const
>          return 16;
>      else
>  #endif
> -        return GetAlignmentOf<word32>();
> +        return GetAlignmentOf<word32, false>();
>  }
>  
>  unsigned int Salsa20_Policy::GetOptimalBlockSize() const
> diff --git a/sha3.h b/sha3.h
> index 232bae5..b1dc2e2 100644
> --- a/sha3.h
> +++ b/sha3.h
> @@ -15,7 +15,7 @@ public:
>      SHA3(unsigned int digestSize) : m_digestSize(digestSize) {Restart();}
>      unsigned int DigestSize() const {return m_digestSize;}
>      std::string AlgorithmName() const {return "SHA-3-" + 
> IntToString(m_digestSize*8);}
> -    unsigned int OptimalDataAlignment() const {return 
> GetAlignmentOf<word64>();}
> +    unsigned int OptimalDataAlignment() const {return 
> GetAlignmentOf<word64, true>();}
>  
>      void Update(const byte *input, size_t length);
>      void Restart();
> diff --git a/skipjack.h b/skipjack.h
> index 6b12647..a533b79 100644
> --- a/skipjack.h
> +++ b/skipjack.h
> @@ -22,7 +22,7 @@ class SKIPJACK : public SKIPJACK_Info, public 
> BlockCipherDocumentation
>      {
>      public:
>          void UncheckedSetKey(const byte *userKey, unsigned int length, 
> const NameValuePairs &params);
> -        unsigned int OptimalDataAlignment() const {return 
> GetAlignmentOf<word16>();}
> +        unsigned int OptimalDataAlignment() const {return 
> GetAlignmentOf<word16, false>();}
>  
>      protected:
>          static const byte fTable[256];
> diff --git a/sosemanuk.cpp b/sosemanuk.cpp
> index 0cc798f..6f13d0b 100644
> --- a/sosemanuk.cpp
> +++ b/sosemanuk.cpp
> @@ -293,7 +293,7 @@ unsigned int SosemanukPolicy::GetAlignment() const
>          return 16;
>      else
>  #endif
> -        return GetAlignmentOf<word32>();
> +        return GetAlignmentOf<word32, false>();
>  }
>  
>  unsigned int SosemanukPolicy::GetOptimalBlockSize() const
>

 

-- 
-- 
You received this message because you are subscribed to the "Crypto++ Users" 
Google Group.
To unsubscribe, send an email to [email protected].
More information about Crypto++ and this group is available at 
http://www.cryptopp.com.
--- 
You received this message because you are subscribed to the Google Groups 
"Crypto++ Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Fixes for GCC 4.8 and above with -O3

Reply via email to