Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Jason A. Donenfeld
On Wed, Dec 14, 2016 at 9:12 PM, Tom Herbert  wrote:
> If you pad the data structure to 64 bits then we can call the version
> of siphash that only deals in 64 bit words. Writing a zero in the
> padding will be cheaper than dealing with odd lengths in siphash24.
On Wed, Dec 14, 2016 at 9:27 PM, Hannes Frederic Sowa
 wrote:
> What I don't really understand is that the addition of this complexity
> actually reduces the performance, as you have to take the "if (left)"
> branch during hashing and causes you to make a load_unaligned_zeropad.

Oh, duh, you guys are right. Fixed in my repo [1]. I'll submit the
next version in a day or so to let some other comments come in.

Thanks again for your reviews.

Jason

[1] https://git.zx2c4.com/linux-dev/log/?h=siphash
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Hannes Frederic Sowa
Hey Jason,

On 14.12.2016 20:38, Jason A. Donenfeld wrote:
> On Wed, Dec 14, 2016 at 8:22 PM, Hannes Frederic Sowa
>  wrote:
>> I don't think this helps. Did you test it? I don't see reason why
>> padding could be left out between `d' and `end' because of the flexible
>> array member?
> 
> Because the type u8 doesn't require any alignment requirements, it can
> nestle right up there cozy with the u16:
> 
> zx2c4@thinkpad ~ $ cat a.c
> #include 
> #include 
> #include 
> int main()
> {
>struct {
>uint64_t a;
>uint32_t b;
>uint32_t c;
>uint16_t d;
>char x[];
>} a;
>printf("%zu\n", sizeof(a));
>printf("%zu\n", offsetof(typeof(a), x));
>return 0;
> }
> zx2c4@thinkpad ~ $ gcc a.c
> zx2c4@thinkpad ~ $ ./a.out
> 24
> 18

Sorry, I misread the patch. You are using offsetof. In this case remove
the char x[] and just use offsetofend because it is misleading
otherwise. Should work like that though.

What I don't really understand is that the addition of this complexity
actually reduces the performance, as you have to take the "if (left)"
branch during hashing and causes you to make a load_unaligned_zeropad.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Tom Herbert
On Wed, Dec 14, 2016 at 4:53 AM, Jason A. Donenfeld  wrote:
> Hi David,
>
> On Wed, Dec 14, 2016 at 10:51 AM, David Laight  
> wrote:
>> From: Jason A. Donenfeld
>>> Sent: 14 December 2016 00:17
>>> This gives a clear speed and security improvement. Rather than manually
>>> filling MD5 buffers, we simply create a layout by a simple anonymous
>>> struct, for which gcc generates rather efficient code.
>> ...
>>> + const struct {
>>> + struct in6_addr saddr;
>>> + struct in6_addr daddr;
>>> + __be16 sport;
>>> + __be16 dport;
>>> + } __packed combined = {
>>> + .saddr = *(struct in6_addr *)saddr,
>>> + .daddr = *(struct in6_addr *)daddr,
>>> + .sport = sport,
>>> + .dport = dport
>>> + };
>>
>> You need to look at the effect of marking this (and the other)
>> structures 'packed' on architectures like sparc64.
>
> In all current uses of __packed in the code, I think the impact is
> precisely zero, because all structures have members in descending
> order of size, with each member being a perfect multiple of the one
> below it. The __packed is therefore just there for safety, in case
> somebody comes in and screws everything up by sticking a u8 in
> between. In that case, it wouldn't be desirable to hash the structure
> padding bits. In the worst case, I don't believe the impact would be
> worse than a byte-by-byte memcpy, which is what the old code did. But
> anyway, these structures are already naturally packed anyway, so the
> present impact is nil.
>
If you pad the data structure to 64 bits then we can call the version
of siphash that only deals in 64 bit words. Writing a zero in the
padding will be cheaper than dealing with odd lengths in siphash24.

Tom

> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Jason A. Donenfeld
Hi Hannes,

On Wed, Dec 14, 2016 at 8:22 PM, Hannes Frederic Sowa
 wrote:
> I don't think this helps. Did you test it? I don't see reason why
> padding could be left out between `d' and `end' because of the flexible
> array member?

Because the type u8 doesn't require any alignment requirements, it can
nestle right up there cozy with the u16:

zx2c4@thinkpad ~ $ cat a.c
#include 
#include 
#include 
int main()
{
   struct {
   uint64_t a;
   uint32_t b;
   uint32_t c;
   uint16_t d;
   char x[];
   } a;
   printf("%zu\n", sizeof(a));
   printf("%zu\n", offsetof(typeof(a), x));
   return 0;
}
zx2c4@thinkpad ~ $ gcc a.c
zx2c4@thinkpad ~ $ ./a.out
24
18

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Hannes Frederic Sowa
On 14.12.2016 19:06, Jason A. Donenfeld wrote:
> Hi David,
> 
> On Wed, Dec 14, 2016 at 6:56 PM, David Miller  wrote:
>> Just marking the structure __packed, whether necessary or not, makes
>> the compiler assume that the members are not aligned and causes
>> byte-by-byte accesses to be performed for words.
>> Never, _ever_, use __packed unless absolutely necessary, it pessimizes
>> the code on cpus that require proper alignment of types.
> 
> Oh, jimminy cricket, I did not realize that it made assignments
> byte-by-byte *always*. So what options am I left with? What
> immediately comes to mind are:
> 
> 1)
> 
> struct {
> u64 a;
> u32 b;
> u32 c;
> u16 d;
> u8 end[];

I don't think this helps. Did you test it? I don't see reason why
padding could be left out between `d' and `end' because of the flexible
array member?

Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Jason A. Donenfeld
Hi David,

On Wed, Dec 14, 2016 at 6:56 PM, David Miller  wrote:
> Just marking the structure __packed, whether necessary or not, makes
> the compiler assume that the members are not aligned and causes
> byte-by-byte accesses to be performed for words.
> Never, _ever_, use __packed unless absolutely necessary, it pessimizes
> the code on cpus that require proper alignment of types.

Oh, jimminy cricket, I did not realize that it made assignments
byte-by-byte *always*. So what options am I left with? What
immediately comes to mind are:

1)

struct {
u64 a;
u32 b;
u32 c;
u16 d;
u8 end[];
} a = {
.a = a,
.b = b,
.c = c,
.d = d
};
siphash24(, offsetof(typeof(a), end), key);

2)

u8 bytes[sizeof(u64) + sizeof(u32) * 2 + sizeof(u16)];
*(u64 *)[0] = a;
*(u32 *)[sizeof(u64)] = b;
*(u32 *)[sizeof(u64) + sizeof(u32)] = c;
*(u16 *)[sizeof(u64) + sizeof(u32) * 2] = d;
siphash24(bytes, sizeof(bytes), key);


Personally I find (1) a bit neater than (2). What's your opinion?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread David Miller
From: "Jason A. Donenfeld" 
Date: Wed, 14 Dec 2016 13:53:10 +0100

> In all current uses of __packed in the code, I think the impact is
> precisely zero, because all structures have members in descending
> order of size, with each member being a perfect multiple of the one
> below it. The __packed is therefore just there for safety, in case
> somebody comes in and screws everything up by sticking a u8 in
> between.

Just marking the structure __packed, whether necessary or not, makes
the compiler assume that the members are not aligned and causes
byte-by-byte accesses to be performed for words.

Never, _ever_, use __packed unless absolutely necessary, it pessimizes
the code on cpus that require proper alignment of types.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Jason A. Donenfeld
On Wed, Dec 14, 2016 at 3:47 PM, David Laight  wrote:
> Just remove the __packed and ensure that the structure is 'nice'.
> This includes ensuring there is no 'tail padding'.
> In some cases you'll need to put the port number into a 32bit field.

I'd rather not. There's no point in wasting extra cycles on hashing
useless data, just for some particular syntactic improvement. __packed
__aligned(8) will do what we want perfectly, I think.

> I'd also require that the key be aligned.

Yep, I'll certainly do this for the siphash24_aligned version in the v3.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Hannes Frederic Sowa
On 14.12.2016 13:53, Jason A. Donenfeld wrote:
> Hi David,
> 
> On Wed, Dec 14, 2016 at 10:51 AM, David Laight  
> wrote:
>> From: Jason A. Donenfeld
>>> Sent: 14 December 2016 00:17
>>> This gives a clear speed and security improvement. Rather than manually
>>> filling MD5 buffers, we simply create a layout by a simple anonymous
>>> struct, for which gcc generates rather efficient code.
>> ...
>>> + const struct {
>>> + struct in6_addr saddr;
>>> + struct in6_addr daddr;
>>> + __be16 sport;
>>> + __be16 dport;
>>> + } __packed combined = {
>>> + .saddr = *(struct in6_addr *)saddr,
>>> + .daddr = *(struct in6_addr *)daddr,
>>> + .sport = sport,
>>> + .dport = dport
>>> + };
>>
>> You need to look at the effect of marking this (and the other)
>> structures 'packed' on architectures like sparc64.
> 
> In all current uses of __packed in the code, I think the impact is
> precisely zero, because all structures have members in descending
> order of size, with each member being a perfect multiple of the one
> below it. The __packed is therefore just there for safety, in case
> somebody comes in and screws everything up by sticking a u8 in
> between. In that case, it wouldn't be desirable to hash the structure
> padding bits. In the worst case, I don't believe the impact would be
> worse than a byte-by-byte memcpy, which is what the old code did. But
> anyway, these structures are already naturally packed anyway, so the
> present impact is nil.

__packed not only removes all padding of the struct but also changes the
alignment assumptions for the whole struct itself. The rule, the struct
is aligned by its maximum alignment of a member is no longer true. That
said, the code accessing this struct will change (not on archs that can
deal efficiently with unaligned access, but on others).

A proper test for not introducing padding is to use something with
BUILD_BUG_ON. Also gcc also clears the padding of the struct, so padding
shouldn't be that bad in there (it is better than byte access on mips).

Btw. I think gcc7 gets support for store merging optimization.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Jason A. Donenfeld
Hi David,

On Wed, Dec 14, 2016 at 10:51 AM, David Laight  wrote:
> From: Jason A. Donenfeld
>> Sent: 14 December 2016 00:17
>> This gives a clear speed and security improvement. Rather than manually
>> filling MD5 buffers, we simply create a layout by a simple anonymous
>> struct, for which gcc generates rather efficient code.
> ...
>> + const struct {
>> + struct in6_addr saddr;
>> + struct in6_addr daddr;
>> + __be16 sport;
>> + __be16 dport;
>> + } __packed combined = {
>> + .saddr = *(struct in6_addr *)saddr,
>> + .daddr = *(struct in6_addr *)daddr,
>> + .sport = sport,
>> + .dport = dport
>> + };
>
> You need to look at the effect of marking this (and the other)
> structures 'packed' on architectures like sparc64.

In all current uses of __packed in the code, I think the impact is
precisely zero, because all structures have members in descending
order of size, with each member being a perfect multiple of the one
below it. The __packed is therefore just there for safety, in case
somebody comes in and screws everything up by sticking a u8 in
between. In that case, it wouldn't be desirable to hash the structure
padding bits. In the worst case, I don't believe the impact would be
worse than a byte-by-byte memcpy, which is what the old code did. But
anyway, these structures are already naturally packed anyway, so the
present impact is nil.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-13 Thread Jason A. Donenfeld
This gives a clear speed and security improvement. Siphash is both
faster and is more solid crypto than the aging MD5.

Rather than manually filling MD5 buffers, we simply create
a layout by a simple anonymous struct, for which gcc generates
rather efficient code.

Signed-off-by: Jason A. Donenfeld 
Cc: Andi Kleen 
---
Changes from v1->v2:

  - Rebased on the latest 4.10, and now uses top 32-bits of siphash
for the optional ts value.

 net/core/secure_seq.c | 160 +-
 1 file changed, 79 insertions(+), 81 deletions(-)

diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 88a8e429fc3e..abadc79cd5d3 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -1,3 +1,5 @@
+/* Copyright (C) 2016 Jason A. Donenfeld . All Rights 
Reserved. */
+
 #include 
 #include 
 #include 
@@ -8,14 +10,14 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 
 #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)
+#include 
 #include 
-#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
-static u32 net_secret[NET_SECRET_SIZE] cacheline_aligned;
+static u8 net_secret[SIPHASH24_KEY_LEN];
 
 static __always_inline void net_secret_init(void)
 {
@@ -44,44 +46,39 @@ static u32 seq_scale(u32 seq)
 u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 __be16 sport, __be16 dport, u32 *tsoff)
 {
-   u32 secret[MD5_MESSAGE_BYTES / 4];
-   u32 hash[MD5_DIGEST_WORDS];
-   u32 i;
-
+   const struct {
+   struct in6_addr saddr;
+   struct in6_addr daddr;
+   __be16 sport;
+   __be16 dport;
+   } __packed combined = {
+   .saddr = *(struct in6_addr *)saddr,
+   .daddr = *(struct in6_addr *)daddr,
+   .sport = sport,
+   .dport = dport
+   };
+   u64 hash;
net_secret_init();
-   memcpy(hash, saddr, 16);
-   for (i = 0; i < 4; i++)
-   secret[i] = net_secret[i] + (__force u32)daddr[i];
-   secret[4] = net_secret[4] +
-   (((__force u16)sport << 16) + (__force u16)dport);
-   for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-   secret[i] = net_secret[i];
-
-   md5_transform(hash, secret);
-
-   *tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-   return seq_scale(hash[0]);
+   hash = siphash24((const u8 *), sizeof(combined), net_secret);
+   *tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+   return seq_scale(hash);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
 
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
   __be16 dport)
 {
-   u32 secret[MD5_MESSAGE_BYTES / 4];
-   u32 hash[MD5_DIGEST_WORDS];
-   u32 i;
-
+   const struct {
+   struct in6_addr saddr;
+   struct in6_addr daddr;
+   __be16 dport;
+   } __packed combined = {
+   .saddr = *(struct in6_addr *)saddr,
+   .daddr = *(struct in6_addr *)daddr,
+   .dport = dport
+   };
net_secret_init();
-   memcpy(hash, saddr, 16);
-   for (i = 0; i < 4; i++)
-   secret[i] = net_secret[i] + (__force u32) daddr[i];
-   secret[4] = net_secret[4] + (__force u32)dport;
-   for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-   secret[i] = net_secret[i];
-
-   md5_transform(hash, secret);
-
-   return hash[0];
+   return siphash24((const u8 *), sizeof(combined), net_secret);
 }
 EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 #endif
@@ -91,33 +88,37 @@ EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
   __be16 sport, __be16 dport, u32 *tsoff)
 {
-   u32 hash[MD5_DIGEST_WORDS];
-
+   const struct {
+   __be32 saddr;
+   __be32 daddr;
+   __be16 sport;
+   __be16 dport;
+   } __packed combined = {
+   .saddr = saddr,
+   .daddr = daddr,
+   .sport = sport,
+   .dport = dport
+   };
+   u64 hash;
net_secret_init();
-   hash[0] = (__force u32)saddr;
-   hash[1] = (__force u32)daddr;
-   hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-   hash[3] = net_secret[15];
-
-   md5_transform(hash, net_secret);
-
-   *tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-   return seq_scale(hash[0]);
+   hash = siphash24((const u8 *), sizeof(combined), net_secret);
+   *tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+   return seq_scale(hash);
 }
 
 u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
-   u32 hash[MD5_DIGEST_WORDS];
-
+   const struct {
+   __be32 saddr;
+   __be32 daddr;
+   __be16