Re: [PATCH v2 2/3] hwrng: exynos - add Samsung Exynos True RNG driver

2017-12-05 Thread PrasannaKumar Muralidharan
Hi Lukasz,

On 27 November 2017 at 15:28, Łukasz Stelmach  wrote:
> Add support for True Random Number Generator found in Samsung Exynos
> 5250+ SoCs.
>
> Signed-off-by: Łukasz Stelmach 
> ---
>  MAINTAINERS  |   7 +
>  drivers/char/hw_random/Kconfig   |  12 ++
>  drivers/char/hw_random/Makefile  |   1 +
>  drivers/char/hw_random/exynos-trng.c | 245 
> +++
>  4 files changed, 265 insertions(+)
>  create mode 100644 drivers/char/hw_random/exynos-trng.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2811a211632c..992074cca612 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11780,6 +11780,13 @@ S: Maintained
>  F: drivers/crypto/exynos-rng.c
>  F: Documentation/devicetree/bindings/rng/samsung,exynos-rng4.txt
>
> +SAMSUNG EXYNOS TRUE RANDOM NUMBER GENERATOR (TRNG) DRIVER
> +M: Łukasz Stelmach 
> +L: linux-samsung-...@vger.kernel.org
> +S: Maintained
> +F: drivers/char/hw_random/exynos-trng.c
> +F: Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
> +
>  SAMSUNG FRAMEBUFFER DRIVER
>  M: Jingoo Han 
>  L: linux-fb...@vger.kernel.org
> diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig
> index 95a031e9eced..292e6b36d493 100644
> --- a/drivers/char/hw_random/Kconfig
> +++ b/drivers/char/hw_random/Kconfig
> @@ -449,6 +449,18 @@ config HW_RANDOM_S390
>
>   If unsure, say Y.
>
> +config HW_RANDOM_EXYNOS
> +   tristate "Samsung Exynos True Random Number Generator support"
> +   depends on ARCH_EXYNOS || COMPILE_TEST
> +   default HW_RANDOM
> +   ---help---
> + This driver provides support for the True Random Number
> + Generator available in Exynos SoCs.
> +
> + To compile this driver as a module, choose M here: the module
> + will be called exynos-trng.
> +
> + If unsure, say Y.
>  endif # HW_RANDOM
>
>  config UML_RANDOM
> diff --git a/drivers/char/hw_random/Makefile b/drivers/char/hw_random/Makefile
> index f3728d008fff..5595df97da7a 100644
> --- a/drivers/char/hw_random/Makefile
> +++ b/drivers/char/hw_random/Makefile
> @@ -14,6 +14,7 @@ obj-$(CONFIG_HW_RANDOM_GEODE) += geode-rng.o
>  obj-$(CONFIG_HW_RANDOM_N2RNG) += n2-rng.o
>  n2-rng-y := n2-drv.o n2-asm.o
>  obj-$(CONFIG_HW_RANDOM_VIA) += via-rng.o
> +obj-$(CONFIG_HW_RANDOM_EXYNOS) += exynos-trng.o
>  obj-$(CONFIG_HW_RANDOM_IXP4XX) += ixp4xx-rng.o
>  obj-$(CONFIG_HW_RANDOM_OMAP) += omap-rng.o
>  obj-$(CONFIG_HW_RANDOM_OMAP3_ROM) += omap3-rom-rng.o
> diff --git a/drivers/char/hw_random/exynos-trng.c 
> b/drivers/char/hw_random/exynos-trng.c
> new file mode 100644
> index ..91b2ddb249fa
> --- /dev/null
> +++ b/drivers/char/hw_random/exynos-trng.c
> @@ -0,0 +1,245 @@
> +/*
> + * RNG driver for Exynos TRNGs
> + *
> + * Author: Łukasz Stelmach 
> + *
> + * Copyright 2017 (c) Samsung Electronics Software, Inc.
> + *
> + * Based on the Exynos PRNG driver drivers/crypto/exynos-rng by
> + * Krzysztof Kozłowski 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation;
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define EXYNOS_TRNG_CLKDIV (0x0)
> +#define EXYNOS_TRNG_CTRL   (0x20)
> +#define EXYNOS_TRNG_POST_CTRL  (0x30)
> +#define EXYNOS_TRNG_ONLINE_CTRL(0x40)
> +#define EXYNOS_TRNG_ONLINE_STAT(0x44)
> +#define EXYNOS_TRNG_ONLINE_MAXCHI2 (0x48)
> +#define EXYNOS_TRNG_FIFO_CTRL  (0x50)
> +#define EXYNOS_TRNG_FIFO_0 (0x80)
> +#define EXYNOS_TRNG_FIFO_1 (0x84)
> +#define EXYNOS_TRNG_FIFO_2 (0x88)
> +#define EXYNOS_TRNG_FIFO_3 (0x8c)
> +#define EXYNOS_TRNG_FIFO_4 (0x90)
> +#define EXYNOS_TRNG_FIFO_5 (0x94)
> +#define EXYNOS_TRNG_FIFO_6 (0x98)
> +#define EXYNOS_TRNG_FIFO_7 (0x9c)
> +#define EXYNOS_TRNG_FIFO_LEN   (8)
> +#define EXYNOS_TRNG_CLOCK_RATE (50)
> +
> +#define TRNG_CTRL_RGNENBIT(31)
> +
> +struct exynos_trng_dev {
> +   struct device*dev;
> +   void __iomem *mem;
> +   struct clk   *clk;
> +   struct hwrng rng;
> +};
> +
> +static int exynos_trng_do_read(struct hwrng *rng, void *data, size_t max,
> +  bool wait)
> +{
> +   struct exynos_trng_dev *trng;
> +   u32 val;
> +
> +   max = min_t(size_t, max, 

[PATCH] lib/mpi: Fix umul_ppmm() for MIPS64r6

2017-12-05 Thread James Hogan
From: James Hogan 

Current MIPS64r6 toolchains aren't able to generate efficient
DMULU/DMUHU based code for the C implementation of umul_ppmm(), which
performs an unsigned 64 x 64 bit multiply and returns the upper and
lower 64-bit halves of the 128-bit result. Instead it widens the 64-bit
inputs to 128-bits and emits a __multi3 intrinsic call to perform a 128
x 128 multiply. This is both inefficient, and it results in a link error
since we don't include __multi3 in MIPS linux.

For example commit 90a53e4432b1 ("cfg80211: implement regdb signature
checking") merged in v4.15-rc1 recently broke the 64r6_defconfig and
64r6el_defconfig builds by indirectly selecting MPILIB. The same build
errors can be reproduced on older kernels by enabling e.g. CRYPTO_RSA:

lib/mpi/generic_mpih-mul1.o: In function `mpihelp_mul_1':
lib/mpi/generic_mpih-mul1.c:50: undefined reference to `__multi3'
lib/mpi/generic_mpih-mul2.o: In function `mpihelp_addmul_1':
lib/mpi/generic_mpih-mul2.c:49: undefined reference to `__multi3'
lib/mpi/generic_mpih-mul3.o: In function `mpihelp_submul_1':
lib/mpi/generic_mpih-mul3.c:49: undefined reference to `__multi3'
lib/mpi/mpih-div.o In function `mpihelp_divrem':
lib/mpi/mpih-div.c:205: undefined reference to `__multi3'
lib/mpi/mpih-div.c:142: undefined reference to `__multi3'

Therefore add an efficient MIPS64r6 implementation of umul_ppmm() using
inline assembly and the DMULU/DMUHU instructions, to prevent __multi3
calls being emitted.

Fixes: 7fd08ca58ae6 ("MIPS: Add build support for the MIPS R6 ISA")
Signed-off-by: James Hogan 
Cc: Ralf Baechle 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: linux-m...@linux-mips.org
Cc: linux-crypto@vger.kernel.org
---
Please can somebody apply this fix for v4.15, as the MIPS 64r6_defconfig
and 64r6el_defconfig builds are broken without it.
---
 lib/mpi/longlong.h | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/lib/mpi/longlong.h b/lib/mpi/longlong.h
index 57fd45ab7af1..08c60d10747f 100644
--- a/lib/mpi/longlong.h
+++ b/lib/mpi/longlong.h
@@ -671,7 +671,23 @@ do {   \
**  MIPS/64  **
***/
 #if (defined(__mips) && __mips >= 3) && W_TYPE_SIZE == 64
-#if (__GNUC__ >= 5) || (__GNUC__ >= 4 && __GNUC_MINOR__ >= 4)
+#if defined(__mips_isa_rev) && __mips_isa_rev >= 6
+/*
+ * GCC ends up emitting a __multi3 intrinsic call for MIPS64r6 with the plain C
+ * code below, so we special case MIPS64r6 until the compiler can do better.
+ */
+#define umul_ppmm(w1, w0, u, v)
\
+do {   \
+   __asm__ ("dmulu %0,%1,%2"   \
+: "=d" ((UDItype)(w0)) \
+: "d" ((UDItype)(u)),  \
+  "d" ((UDItype)(v))); \
+   __asm__ ("dmuhu %0,%1,%2"   \
+: "=d" ((UDItype)(w1)) \
+: "d" ((UDItype)(u)),  \
+  "d" ((UDItype)(v))); \
+} while (0)
+#elif (__GNUC__ >= 5) || (__GNUC__ >= 4 && __GNUC_MINOR__ >= 4)
 #define umul_ppmm(w1, w0, u, v) \
 do {   \
typedef unsigned int __ll_UTItype __attribute__((mode(TI)));\
-- 
2.14.1



[PATCH v9 8/8] ntb: ntb_hw_switchtec: Cleanup 64bit IO defines to use the common header

2017-12-05 Thread Logan Gunthorpe
Clean up the ifdefs which conditionally defined the io{read|write}64
functions in favour of the new common io-64-nonatomic-lo-hi header.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
---
 drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 30 +-
 1 file changed, 1 insertion(+), 29 deletions(-)

diff --git a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c 
b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
index afe8ed6f3b23..53d3a34cddf3 100644
--- a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
+++ b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 MODULE_DESCRIPTION("Microsemi Switchtec(tm) NTB Driver");
 MODULE_VERSION("0.1");
@@ -35,35 +36,6 @@ module_param(use_lut_mws, bool, 0644);
 MODULE_PARM_DESC(use_lut_mws,
 "Enable the use of the LUT based memory windows");
 
-#ifndef ioread64
-#ifdef readq
-#define ioread64 readq
-#else
-#define ioread64 _ioread64
-static inline u64 _ioread64(void __iomem *mmio)
-{
-   u64 low, high;
-
-   low = ioread32(mmio);
-   high = ioread32(mmio + sizeof(u32));
-   return low | (high << 32);
-}
-#endif
-#endif
-
-#ifndef iowrite64
-#ifdef writeq
-#define iowrite64 writeq
-#else
-#define iowrite64 _iowrite64
-static inline void _iowrite64(u64 val, void __iomem *mmio)
-{
-   iowrite32(val, mmio);
-   iowrite32(val >> 32, mmio + sizeof(u32));
-}
-#endif
-#endif
-
 #define SWITCHTEC_NTB_MAGIC 0x45CC0001
 #define MAX_MWS 128
 
-- 
2.11.0



[PATCH v9 2/8] powerpc: io.h: move iomap.h include so that it can use readq/writeq defs

2017-12-05 Thread Logan Gunthorpe
Subsequent patches in this series makes use of the readq and writeq
defines in iomap.h. However, as is, they get missed on the powerpc
platform seeing the include comes before the define. This patch
moves the include down to fix this.

Signed-off-by: Logan Gunthorpe 
Acked-by: Michael Ellerman 
Reviewed-by: Andy Shevchenko 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Suresh Warrier 
Cc: "Oliver O'Halloran" 
---
 arch/powerpc/include/asm/io.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 422f99cf9924..af074923d598 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -33,8 +33,6 @@ extern struct pci_dev *isa_bridge_pcidev;
 #include 
 #include 

-#include 
-
 #ifdef CONFIG_PPC64
 #include 
 #endif
@@ -663,6 +661,8 @@ static inline void name at  
\
 #define writel_relaxed(v, addr)writel(v, addr)
 #define writeq_relaxed(v, addr)writeq(v, addr)

+#include 
+
 #ifdef CONFIG_PPC32
 #define mmiowb()
 #else
--
2.11.0


[PATCH v9 6/8] ntb: ntb_hw_intel: use io-64-nonatomic instead of in-driver hacks

2017-12-05 Thread Logan Gunthorpe
Now that ioread64 and iowrite64 are available in io-64-nonatomic,
we can remove the hack at the top of ntb_hw_intel.c and replace it
with an include.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Andy Shevchenko 
Acked-by: Dave Jiang 
Acked-by: Allen Hubbe 
Acked-by: Jon Mason 

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Date:  Mon Jun 19 12:18:31 2017 -0600
#
# interactive rebase in progress; onto ae64f9bd1d36
# Last commands done (6 commands done):
#pick cf3e4dab2173 io-64-nonatomic: add io{read|write}64[be]{_lo_hi|_hi_lo} 
macros
#r 79b4c4b8490c ntb: ntb_hw_intel: use io-64-nonatomic instead of in-driver 
hacks
# Next commands to do (2 remaining commands):
#r 19b6c1f3b15d crypto: caam: cleanup CONFIG_64BIT ifdefs when using 
io{read|write}64
#r f3c8723446ef ntb_hw_switchtec: Cleanup 64bit IO defines to use the 
common header
# You are currently editing a commit while rebasing branch 'io64_v9' on 
'ae64f9bd1d36'.
#
# Changes to be committed:
#   modified:   drivers/ntb/hw/intel/ntb_hw_intel.c
#
---
 drivers/ntb/hw/intel/ntb_hw_intel.c | 30 +-
 1 file changed, 1 insertion(+), 29 deletions(-)

diff --git a/drivers/ntb/hw/intel/ntb_hw_intel.c 
b/drivers/ntb/hw/intel/ntb_hw_intel.c
index 4de074a86073..119cfc45617e 100644
--- a/drivers/ntb/hw/intel/ntb_hw_intel.c
+++ b/drivers/ntb/hw/intel/ntb_hw_intel.c
@@ -59,6 +59,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ntb_hw_intel.h"
 
@@ -155,35 +156,6 @@ MODULE_PARM_DESC(xeon_b2b_dsd_bar5_addr32,
 static inline enum ntb_topo xeon_ppd_topo(struct intel_ntb_dev *ndev, u8 ppd);
 static int xeon_init_isr(struct intel_ntb_dev *ndev);
 
-#ifndef ioread64
-#ifdef readq
-#define ioread64 readq
-#else
-#define ioread64 _ioread64
-static inline u64 _ioread64(void __iomem *mmio)
-{
-   u64 low, high;
-
-   low = ioread32(mmio);
-   high = ioread32(mmio + sizeof(u32));
-   return low | (high << 32);
-}
-#endif
-#endif
-
-#ifndef iowrite64
-#ifdef writeq
-#define iowrite64 writeq
-#else
-#define iowrite64 _iowrite64
-static inline void _iowrite64(u64 val, void __iomem *mmio)
-{
-   iowrite32(val, mmio);
-   iowrite32(val >> 32, mmio + sizeof(u32));
-}
-#endif
-#endif
-
 static inline int pdev_is_atom(struct pci_dev *pdev)
 {
switch (pdev->device) {
-- 
2.11.0



[PATCH v9 3/8] powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo}

2017-12-05 Thread Logan Gunthorpe
These functions will be introduced into the generic iomap.c so
they can deal with PIO accesses in hi-lo/lo-hi variants. Thus,
the powerpc version of iomap.c will need to provide the same
functions even though, in this arch, they are identical to the
regular io{read|write}64 functions.

Signed-off-by: Logan Gunthorpe 
Tested-by: Horia Geantă 
Reviewed-by: Andy Shevchenko 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
---
 arch/powerpc/kernel/iomap.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/arch/powerpc/kernel/iomap.c b/arch/powerpc/kernel/iomap.c
index aab456ed2a00..5ac84efc6ede 100644
--- a/arch/powerpc/kernel/iomap.c
+++ b/arch/powerpc/kernel/iomap.c
@@ -45,12 +45,32 @@ u64 ioread64(void __iomem *addr)
 {
return readq(addr);
 }
+u64 ioread64_lo_hi(void __iomem *addr)
+{
+   return readq(addr);
+}
+u64 ioread64_hi_lo(void __iomem *addr)
+{
+   return readq(addr);
+}
 u64 ioread64be(void __iomem *addr)
 {
return readq_be(addr);
 }
+u64 ioread64be_lo_hi(void __iomem *addr)
+{
+   return readq_be(addr);
+}
+u64 ioread64be_hi_lo(void __iomem *addr)
+{
+   return readq_be(addr);
+}
 EXPORT_SYMBOL(ioread64);
+EXPORT_SYMBOL(ioread64_lo_hi);
+EXPORT_SYMBOL(ioread64_hi_lo);
 EXPORT_SYMBOL(ioread64be);
+EXPORT_SYMBOL(ioread64be_lo_hi);
+EXPORT_SYMBOL(ioread64be_hi_lo);
 #endif /* __powerpc64__ */
 
 void iowrite8(u8 val, void __iomem *addr)
@@ -83,12 +103,32 @@ void iowrite64(u64 val, void __iomem *addr)
 {
writeq(val, addr);
 }
+void iowrite64_lo_hi(u64 val, void __iomem *addr)
+{
+   writeq(val, addr);
+}
+void iowrite64_hi_lo(u64 val, void __iomem *addr)
+{
+   writeq(val, addr);
+}
 void iowrite64be(u64 val, void __iomem *addr)
 {
writeq_be(val, addr);
 }
+void iowrite64be_lo_hi(u64 val, void __iomem *addr)
+{
+   writeq_be(val, addr);
+}
+void iowrite64be_hi_lo(u64 val, void __iomem *addr)
+{
+   writeq_be(val, addr);
+}
 EXPORT_SYMBOL(iowrite64);
+EXPORT_SYMBOL(iowrite64_lo_hi);
+EXPORT_SYMBOL(iowrite64_hi_lo);
 EXPORT_SYMBOL(iowrite64be);
+EXPORT_SYMBOL(iowrite64be_lo_hi);
+EXPORT_SYMBOL(iowrite64be_hi_lo);
 #endif /* __powerpc64__ */
 
 /*
-- 
2.11.0



[PATCH v9 4/8] iomap: introduce io{read|write}64_{lo_hi|hi_lo}

2017-12-05 Thread Logan Gunthorpe
In order to provide non-atomic functions for io{read|write}64 that will
use readq and writeq when appropriate. We define a number of variants
of these functions in the generic iomap that will do non-atomic
operations on pio but atomic operations on mmio.

These functions are only defined if readq and writeq are defined. If
they are not, then the wrappers that always use non-atomic operations
from include/linux/io-64-nonatomic*.h will be used.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Andy Shevchenko 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Arnd Bergmann 
Cc: Suresh Warrier 
Cc: Nicholas Piggin 
---
 arch/powerpc/include/asm/io.h |   2 +
 include/asm-generic/iomap.h   |  26 +++--
 lib/iomap.c   | 132 ++
 3 files changed, 154 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index af074923d598..4cc420cfaa78 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -788,8 +788,10 @@ extern void __iounmap_at(void *ea, unsigned long size);
 
 #define mmio_read16be(addr)readw_be(addr)
 #define mmio_read32be(addr)readl_be(addr)
+#define mmio_read64be(addr)readq_be(addr)
 #define mmio_write16be(val, addr)  writew_be(val, addr)
 #define mmio_write32be(val, addr)  writel_be(val, addr)
+#define mmio_write64be(val, addr)  writeq_be(val, addr)
 #define mmio_insb(addr, dst, count)readsb(addr, dst, count)
 #define mmio_insw(addr, dst, count)readsw(addr, dst, count)
 #define mmio_insl(addr, dst, count)readsl(addr, dst, count)
diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h
index 5b63b94ef6b5..5a4af0199b32 100644
--- a/include/asm-generic/iomap.h
+++ b/include/asm-generic/iomap.h
@@ -31,9 +31,16 @@ extern unsigned int ioread16(void __iomem *);
 extern unsigned int ioread16be(void __iomem *);
 extern unsigned int ioread32(void __iomem *);
 extern unsigned int ioread32be(void __iomem *);
-#ifdef CONFIG_64BIT
-extern u64 ioread64(void __iomem *);
-extern u64 ioread64be(void __iomem *);
+
+#ifdef readq
+#define ioread64_lo_hi ioread64_lo_hi
+#define ioread64_hi_lo ioread64_hi_lo
+#define ioread64be_lo_hi ioread64be_lo_hi
+#define ioread64be_hi_lo ioread64be_hi_lo
+extern u64 ioread64_lo_hi(void __iomem *addr);
+extern u64 ioread64_hi_lo(void __iomem *addr);
+extern u64 ioread64be_lo_hi(void __iomem *addr);
+extern u64 ioread64be_hi_lo(void __iomem *addr);
 #endif
 
 extern void iowrite8(u8, void __iomem *);
@@ -41,9 +48,16 @@ extern void iowrite16(u16, void __iomem *);
 extern void iowrite16be(u16, void __iomem *);
 extern void iowrite32(u32, void __iomem *);
 extern void iowrite32be(u32, void __iomem *);
-#ifdef CONFIG_64BIT
-extern void iowrite64(u64, void __iomem *);
-extern void iowrite64be(u64, void __iomem *);
+
+#ifdef writeq
+#define iowrite64_lo_hi iowrite64_lo_hi
+#define iowrite64_hi_lo iowrite64_hi_lo
+#define iowrite64be_lo_hi iowrite64be_lo_hi
+#define iowrite64be_hi_lo iowrite64be_hi_lo
+extern void iowrite64_lo_hi(u64 val, void __iomem *addr);
+extern void iowrite64_hi_lo(u64 val, void __iomem *addr);
+extern void iowrite64be_lo_hi(u64 val, void __iomem *addr);
+extern void iowrite64be_hi_lo(u64 val, void __iomem *addr);
 #endif
 
 /*
diff --git a/lib/iomap.c b/lib/iomap.c
index 541d926da95e..d324b6c013af 100644
--- a/lib/iomap.c
+++ b/lib/iomap.c
@@ -67,6 +67,7 @@ static void bad_io_access(unsigned long port, const char 
*access)
 #ifndef mmio_read16be
 #define mmio_read16be(addr) be16_to_cpu(__raw_readw(addr))
 #define mmio_read32be(addr) be32_to_cpu(__raw_readl(addr))
+#define mmio_read64be(addr) be64_to_cpu(__raw_readq(addr))
 #endif
 
 unsigned int ioread8(void __iomem *addr)
@@ -100,6 +101,80 @@ EXPORT_SYMBOL(ioread16be);
 EXPORT_SYMBOL(ioread32);
 EXPORT_SYMBOL(ioread32be);
 
+#ifdef readq
+static u64 pio_read64_lo_hi(unsigned long port)
+{
+   u64 lo, hi;
+
+   lo = inl(port);
+   hi = inl(port + sizeof(u32));
+
+   return lo | (hi << 32);
+}
+
+static u64 pio_read64_hi_lo(unsigned long port)
+{
+   u64 lo, hi;
+
+   hi = inl(port + sizeof(u32));
+   lo = inl(port);
+
+   return lo | (hi << 32);
+}
+
+static u64 pio_read64be_lo_hi(unsigned long port)
+{
+   u64 lo, hi;
+
+   lo = pio_read32be(port + sizeof(u32));
+   hi = pio_read32be(port);
+
+   return lo | (hi << 32);
+}
+
+static u64 pio_read64be_hi_lo(unsigned long port)
+{
+   u64 lo, hi;
+
+   hi = pio_read32be(port);
+   lo = pio_read32be(port + sizeof(u32));
+
+   return lo | (hi << 32);
+}
+
+u64 ioread64_lo_hi(void __iomem *addr)
+{
+   IO_COND(addr, return pio_read64_lo_hi(port), return readq(addr));
+   return 0xULL;
+}
+

[PATCH v9 7/8] crypto: caam: cleanup CONFIG_64BIT ifdefs when using io{read|write}64

2017-12-05 Thread Logan Gunthorpe
Clean up the extra ifdefs which defined the wr_reg64 and rd_reg64
functions in non-64bit cases in favour of the new common
io-64-nonatomic-lo-hi header.

Signed-off-by: Logan Gunthorpe 
Cc: Andy Shevchenko 
Cc: Horia Geantă 
Cc: Dan Douglass 
Cc: Herbert Xu 
Cc: "David S. Miller" 
---
 drivers/crypto/caam/regs.h | 26 +-
 1 file changed, 1 insertion(+), 25 deletions(-)

diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
index fee363865d88..ec6528e5ce9d 100644
--- a/drivers/crypto/caam/regs.h
+++ b/drivers/crypto/caam/regs.h
@@ -10,7 +10,7 @@
 
 #include 
 #include 
-#include 
+#include 
 
 /*
  * Architecture-specific register access methods
@@ -136,7 +136,6 @@ static inline void clrsetbits_32(void __iomem *reg, u32 
clear, u32 set)
  *base + 0x : least-significant 32 bits
  *base + 0x0004 : most-significant 32 bits
  */
-#ifdef CONFIG_64BIT
 static inline void wr_reg64(void __iomem *reg, u64 data)
 {
if (caam_little_end)
@@ -153,29 +152,6 @@ static inline u64 rd_reg64(void __iomem *reg)
return ioread64be(reg);
 }
 
-#else /* CONFIG_64BIT */
-static inline void wr_reg64(void __iomem *reg, u64 data)
-{
-   if (!caam_imx && caam_little_end) {
-   wr_reg32((u32 __iomem *)(reg) + 1, data >> 32);
-   wr_reg32((u32 __iomem *)(reg), data);
-   } else {
-   wr_reg32((u32 __iomem *)(reg), data >> 32);
-   wr_reg32((u32 __iomem *)(reg) + 1, data);
-   }
-}
-
-static inline u64 rd_reg64(void __iomem *reg)
-{
-   if (!caam_imx && caam_little_end)
-   return ((u64)rd_reg32((u32 __iomem *)(reg) + 1) << 32 |
-   (u64)rd_reg32((u32 __iomem *)(reg)));
-
-   return ((u64)rd_reg32((u32 __iomem *)(reg)) << 32 |
-   (u64)rd_reg32((u32 __iomem *)(reg) + 1));
-}
-#endif /* CONFIG_64BIT  */
-
 static inline u64 cpu_to_caam_dma64(dma_addr_t value)
 {
if (caam_imx)
-- 
2.11.0



[PATCH v9 0/8] Add io{read|write}64 to io-64-atomic headers

2017-12-05 Thread Logan Gunthorpe
This is v9 of my cleanup series to push a number of instances of people
defining their own io{read|write}64 functions when they don't exist in
non-64bit systems. This series adds inline functions to the
io-64-nonatomic headers and then cleans up the drivers that defined their
own.

Changes since v8:
- Rebased onto v4.15-rc2, as a result rewrote patch 7 seeing someone did
  some similar cleanup in that area.
- Added a patch to clean up the Switchtec NTB driver which landed in
  v4.15-rc1

Changes since v7:
- Fix minor nits from Andy Shevchenko
- Rebased onto v4.14-rc1

Changes since v6:
 ** none **

Changes since v5:
- Added a fix to the tilcdc driver to ensure it doesn't use the
  non-atomic operation. (This includes adding io{read|write}64[be]_is_nonatomic
  defines).

Changes since v4:
- Add functions so the powerpc implementation of iomap.c compiles. (As
  noticed by Horia)

Changes since v3:

- I noticed powerpc didn't use the appropriate functions seeing
  readq/writeq were not defined when iomap.h was included. Thus I've
  included a patch to adjust this
- Fixed some mistakes with a couple of the defines in io-64-nonatomic*
  headers
- Fixed a typo noticed by Horia.

(earlier versions were drastically different)

Logan Gunthorpe (8):
  drm/tilcdc: ensure nonatomic iowrite64 is not used
  powerpc: io.h: move iomap.h include so that it can use readq/writeq
defs
  powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo}
  iomap: introduce io{read|write}64_{lo_hi|hi_lo}
  io-64-nonatomic: add io{read|write}64[be]{_lo_hi|_hi_lo} macros
  ntb: ntb_hw_intel: use io-64-nonatomic instead of in-driver hacks
  crypto: caam: cleanup CONFIG_64BIT ifdefs when using io{read|write}64
  ntb: ntb_hw_switchtec: Cleanup 64bit IO defines to use the common
header

 arch/powerpc/include/asm/io.h  |   6 +-
 arch/powerpc/kernel/iomap.c|  40 ++
 drivers/crypto/caam/regs.h |  26 +--
 drivers/gpu/drm/tilcdc/tilcdc_regs.h   |   2 +-
 drivers/ntb/hw/intel/ntb_hw_intel.c|  30 +---
 drivers/ntb/hw/mscc/ntb_hw_switchtec.c |  30 +---
 include/asm-generic/iomap.h|  26 +--
 include/linux/io-64-nonatomic-hi-lo.h  |  64 
 include/linux/io-64-nonatomic-lo-hi.h  |  64 
 lib/iomap.c| 132 +
 10 files changed, 328 insertions(+), 92 deletions(-)

--
2.11.0


[PATCH v9 1/8] drm/tilcdc: ensure nonatomic iowrite64 is not used

2017-12-05 Thread Logan Gunthorpe
Add a check to ensure iowrite64 is only used if it is atomic.

It was decided in [1] that the tilcdc driver should not be using an
atomic operation (so it was left out of this patchset). However, it turns
out that through the drm code, a nonatomic header is actually included:

include/linux/io-64-nonatomic-lo-hi.h
is included from include/drm/drm_os_linux.h:9:0,
from include/drm/drmP.h:74,
from include/drm/drm_modeset_helper.h:26,
from include/drm/drm_atomic_helper.h:33,
from drivers/gpu/drm/tilcdc/tilcdc_crtc.c:19:

And thus, without this change, this patchset would inadvertantly
change the behaviour of the tilcdc driver.

[1] 
lkml.kernel.org/r/cak8p3a2hho_zcnstzq7hmwsz5la5thu19fwzpun16imnyyn...@mail.gmail.com

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Andy Shevchenko 
Cc: Jyri Sarha 
Cc: Arnd Bergmann 
Cc: Tomi Valkeinen 
Cc: David Airlie 
---
 drivers/gpu/drm/tilcdc/tilcdc_regs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tilcdc/tilcdc_regs.h 
b/drivers/gpu/drm/tilcdc/tilcdc_regs.h
index 9d528c0a67a4..5048ebb86835 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_regs.h
+++ b/drivers/gpu/drm/tilcdc/tilcdc_regs.h
@@ -133,7 +133,7 @@ static inline void tilcdc_write64(struct drm_device *dev, 
u32 reg, u64 data)
struct tilcdc_drm_private *priv = dev->dev_private;
volatile void __iomem *addr = priv->mmio + reg;
 
-#ifdef iowrite64
+#if defined(iowrite64) && !defined(iowrite64_is_nonatomic)
iowrite64(data, addr);
 #else
__iowmb();
-- 
2.11.0



[PATCH v9 5/8] io-64-nonatomic: add io{read|write}64[be]{_lo_hi|_hi_lo} macros

2017-12-05 Thread Logan Gunthorpe
This patch adds generic io{read|write}64[be]{_lo_hi|_hi_lo} macros if
they are not already defined by the architecture. (As they are provided
by the generic iomap library).

The patch also points io{read|write}64[be] to the variant specified by the
header name.

This is because new drivers are encouraged to use ioreadXX, et al instead
of readX[1], et al -- and mixing ioreadXX with readq is pretty ugly.

[1] LDD3: section 9.4.2

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Andy Shevchenko 
Cc: Christoph Hellwig 
Cc: Arnd Bergmann 
Cc: Alan Cox 
Cc: Greg Kroah-Hartman 
---
 include/linux/io-64-nonatomic-hi-lo.h | 64 +++
 include/linux/io-64-nonatomic-lo-hi.h | 64 +++
 2 files changed, 128 insertions(+)

diff --git a/include/linux/io-64-nonatomic-hi-lo.h 
b/include/linux/io-64-nonatomic-hi-lo.h
index 862d786a904f..ae21b72cce85 100644
--- a/include/linux/io-64-nonatomic-hi-lo.h
+++ b/include/linux/io-64-nonatomic-hi-lo.h
@@ -55,4 +55,68 @@ static inline void hi_lo_writeq_relaxed(__u64 val, volatile 
void __iomem *addr)
 #define writeq_relaxed hi_lo_writeq_relaxed
 #endif
 
+#ifndef ioread64_hi_lo
+#define ioread64_hi_lo ioread64_hi_lo
+static inline u64 ioread64_hi_lo(void __iomem *addr)
+{
+   u32 low, high;
+
+   high = ioread32(addr + sizeof(u32));
+   low = ioread32(addr);
+
+   return low + ((u64)high << 32);
+}
+#endif
+
+#ifndef iowrite64_hi_lo
+#define iowrite64_hi_lo iowrite64_hi_lo
+static inline void iowrite64_hi_lo(u64 val, void __iomem *addr)
+{
+   iowrite32(val >> 32, addr + sizeof(u32));
+   iowrite32(val, addr);
+}
+#endif
+
+#ifndef ioread64be_hi_lo
+#define ioread64be_hi_lo ioread64be_hi_lo
+static inline u64 ioread64be_hi_lo(void __iomem *addr)
+{
+   u32 low, high;
+
+   high = ioread32be(addr);
+   low = ioread32be(addr + sizeof(u32));
+
+   return low + ((u64)high << 32);
+}
+#endif
+
+#ifndef iowrite64be_hi_lo
+#define iowrite64be_hi_lo iowrite64be_hi_lo
+static inline void iowrite64be_hi_lo(u64 val, void __iomem *addr)
+{
+   iowrite32be(val >> 32, addr);
+   iowrite32be(val, addr + sizeof(u32));
+}
+#endif
+
+#ifndef ioread64
+#define ioread64_is_nonatomic
+#define ioread64 ioread64_hi_lo
+#endif
+
+#ifndef iowrite64
+#define iowrite64_is_nonatomic
+#define iowrite64 iowrite64_hi_lo
+#endif
+
+#ifndef ioread64be
+#define ioread64be_is_nonatomic
+#define ioread64be ioread64be_hi_lo
+#endif
+
+#ifndef iowrite64be
+#define iowrite64be_is_nonatomic
+#define iowrite64be iowrite64be_hi_lo
+#endif
+
 #endif /* _LINUX_IO_64_NONATOMIC_HI_LO_H_ */
diff --git a/include/linux/io-64-nonatomic-lo-hi.h 
b/include/linux/io-64-nonatomic-lo-hi.h
index d042e7bb5adb..faaa842dbdb9 100644
--- a/include/linux/io-64-nonatomic-lo-hi.h
+++ b/include/linux/io-64-nonatomic-lo-hi.h
@@ -55,4 +55,68 @@ static inline void lo_hi_writeq_relaxed(__u64 val, volatile 
void __iomem *addr)
 #define writeq_relaxed lo_hi_writeq_relaxed
 #endif
 
+#ifndef ioread64_lo_hi
+#define ioread64_lo_hi ioread64_lo_hi
+static inline u64 ioread64_lo_hi(void __iomem *addr)
+{
+   u32 low, high;
+
+   low = ioread32(addr);
+   high = ioread32(addr + sizeof(u32));
+
+   return low + ((u64)high << 32);
+}
+#endif
+
+#ifndef iowrite64_lo_hi
+#define iowrite64_lo_hi iowrite64_lo_hi
+static inline void iowrite64_lo_hi(u64 val, void __iomem *addr)
+{
+   iowrite32(val, addr);
+   iowrite32(val >> 32, addr + sizeof(u32));
+}
+#endif
+
+#ifndef ioread64be_lo_hi
+#define ioread64be_lo_hi ioread64be_lo_hi
+static inline u64 ioread64be_lo_hi(void __iomem *addr)
+{
+   u32 low, high;
+
+   low = ioread32be(addr + sizeof(u32));
+   high = ioread32be(addr);
+
+   return low + ((u64)high << 32);
+}
+#endif
+
+#ifndef iowrite64be_lo_hi
+#define iowrite64be_lo_hi iowrite64be_lo_hi
+static inline void iowrite64be_lo_hi(u64 val, void __iomem *addr)
+{
+   iowrite32be(val, addr + sizeof(u32));
+   iowrite32be(val >> 32, addr);
+}
+#endif
+
+#ifndef ioread64
+#define ioread64_is_nonatomic
+#define ioread64 ioread64_lo_hi
+#endif
+
+#ifndef iowrite64
+#define iowrite64_is_nonatomic
+#define iowrite64 iowrite64_lo_hi
+#endif
+
+#ifndef ioread64be
+#define ioread64be_is_nonatomic
+#define ioread64be ioread64be_lo_hi
+#endif
+
+#ifndef iowrite64be
+#define iowrite64be_is_nonatomic
+#define iowrite64be iowrite64be_lo_hi
+#endif
+
 #endif /* _LINUX_IO_64_NONATOMIC_LO_HI_H_ */
-- 
2.11.0



Re: [PATCH 2/3] crypto: exynos - Improve performance of PRNG

2017-12-05 Thread Krzysztof Kozlowski
On Tue, Dec 5, 2017 at 6:53 PM, Krzysztof Kozlowski  wrote:
> On Tue, Dec 05, 2017 at 05:43:10PM +0100, Łukasz Stelmach wrote:
>> It was <2017-12-05 wto 14:54>, when Stephan Mueller wrote:
>> > Am Dienstag, 5. Dezember 2017, 13:35:57 CET schrieb Łukasz Stelmach:
>> >
>> > Hi Łukasz,
>> >
>> >> Use memcpy_fromio() instead of custom exynos_rng_copy_random() function
>> >> to retrieve generated numbers from the registers of PRNG.
>> >>
>> >> Remove unnecessary invocation of cpu_relax().
>> >>
>> >> Signed-off-by: Łukasz Stelmach 
>> >> ---
>> >>  drivers/crypto/exynos-rng.c | 36 +---
>> >>  1 file changed, 5 insertions(+), 31 deletions(-)
>> >>
>> >> diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
>> >> index 894ef93ef5ec..002e9d2a83cc 100644
>> >> --- a/drivers/crypto/exynos-rng.c
>> >> +++ b/drivers/crypto/exynos-rng.c
>>
>> [...]
>>
>> >> @@ -171,6 +143,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev
>> >> *rng, {
>> >>int retry = EXYNOS_RNG_WAIT_RETRIES;
>> >>
>> >> +  *read = min_t(size_t, dlen, EXYNOS_RNG_SEED_SIZE);
>> >> +
>> >>if (rng->type == EXYNOS_PRNG_TYPE4) {
>> >>exynos_rng_writel(rng, EXYNOS_RNG_CONTROL_START,
>> >>  EXYNOS_RNG_CONTROL);
>> >> @@ -180,8 +154,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev
>> >> *rng, }
>> >>
>> >>while (!(exynos_rng_readl(rng,
>> >> -  EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) && 
>> >> --retry)
>> >> -  cpu_relax();
>> >> +  EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) &&
>> >> + --retry);
>> SM>
>> SM> Is this related to the patch?
>>
>> KK> It looks like unrelated change so split it into separate commit with
>> KK> explanation why you are changing the common busy-loop pattern.
>> KK> exynos_rng_readl() uses relaxed versions of readl() so I would expect
>> KK> here cpu_relax().
>>
>> Yes. As far as I can tell this gives the major part of the performance
>> improvement brought by this patch.
>
> In that case definitely split and explain... what and why you want to
> achieve here.
>
>>
>> The busy loop is not very busy. Every time I checked the loop (w/o
>> cpu_relax()) was executed twice (retry was 98) and the operation was
>> reliable. I don't see why do we need a memory barrier here. On the other
>> hand, I am not sure the whole exynos_rng_get_random() shouldn't be ran
>> under a mutex or a spinlock (I don't see anything like this in the upper
>> layers of the crypto framework).
>>
>> The *_relaxed() I/O operations do not enforce memory
>
> The cpu_relax() is a common pattern for busy-loop. If you want to break
> this pattern - please explain why only this part of kernel should not
> follow it (and rest of kernel should).
>
> The other part - this code is already using relaxed versions which might
> get you into difficult to debug issues. You mentioned that loop works
> reliable after removing the cpu_relax... yeah, it might for 99.999% but
> that's not the argument. I remember few emails from Arnd Bergmann
> mentioning explicitly to avoid using relaxed versions "just because",
> unless it is necessary or really understood.
>
> The code first writes to control register, then checks for status so you
> should have these operations strictly ordered. Therefore I think
> cpu_relax() should not be removed.

... or just convert it to readl_poll_timeout() because it makes code
more readable, takes care of timeout and you do not have care about
specific implementation (whether there should or should not be
cpu_relax).

Best regards,
Krzysztof


Re: [PATCH v2 11/19] arm64: assembler: add macro to conditionally yield the NEON under PREEMPT

2017-12-05 Thread Ard Biesheuvel
On 5 December 2017 at 12:45, Ard Biesheuvel  wrote:
>
>
>> On 5 Dec 2017, at 12:28, Dave Martin  wrote:
>>
>>> On Mon, Dec 04, 2017 at 12:26:37PM +, Ard Biesheuvel wrote:
>>> Add a support macro to conditionally yield the NEON (and thus the CPU)
>>> that may be called from the assembler code. Given that especially the
>>> instruction based accelerated crypto code may use very tight loops, add
>>> some parametrization so that the TIF_NEED_RESCHED flag test is only
>>> executed every so many loop iterations.
>>>
>>> In some cases, yielding the NEON involves saving and restoring a non
>>> trivial amount of context (especially in the CRC folding algorithms),
>>> and so the macro is split into two, and the code in between is only
>>> executed when the yield path is taken, allowing the contex to be preserved.
>>> The second macro takes a label argument that marks the resume-from-yield
>>> path, which should restore the preserved context again.
>>>
>>> Signed-off-by: Ard Biesheuvel 
>>> ---
>>> arch/arm64/include/asm/assembler.h | 50 
>>> 1 file changed, 50 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/assembler.h 
>>> b/arch/arm64/include/asm/assembler.h
>>> index aef72d886677..917b026d3e00 100644
>>> --- a/arch/arm64/include/asm/assembler.h
>>> +++ b/arch/arm64/include/asm/assembler.h
>>> @@ -512,4 +512,54 @@ alternative_else_nop_endif
>>> #endif
>>>.endm
>>>
>>> +/*
>>> + * yield_neon - check whether to yield to another runnable task from
>>> + *kernel mode NEON code (running with preemption disabled)
>>> + *
>>> + * - Check whether the preempt count is exactly 1, in which case disabling
>>> + *   preemption once will make the task preemptible. If this is not the 
>>> case,
>>> + *   yielding is pointless.
>>> + * - Check whether TIF_NEED_RESCHED is set, and if so, disable and 
>>> re-enable
>>> + *   kernel mode NEON (which will trigger a reschedule), and branch to the
>>> + *   yield fixup code at @lbl.
>>> + */
>>> +.macroyield_neon, lbl:req, ctr, order, stride, loop
>>> +yield_neon_pre\ctr, \order, \stride, \loop
>>> +yield_neon_post\lbl
>>> +.endm
>>> +
>>> +.macroyield_neon_pre, ctr, order=0, stride, loop=f
>>> +#ifdef CONFIG_PREEMPT
>>> +/*
>>> + * With some algorithms, it makes little sense to poll the
>>> + * TIF_NEED_RESCHED flag after every iteration, so only perform
>>> + * the check every 2^order strides.
>>> + */
>>> +.if\order > 1
>>> +.if(\stride & (\stride - 1)) != 0
>>> +.error"stride should be a power of 2"
>>> +.endif
>>> +tst\ctr, #((1 << \order) * \stride - 1) & ~(\stride - 1)
>>> +b.ne\loop
>>> +.endif
>>
>> I'm not sure what baking in this check gives us, and this seems to
>> conflate two rather different things: yielding and defining a
>> "heartbeat" frequency for the calling code.
>>
>> Can we separate out the crypto-loop-helper semantics from the yield-
>> neon stuff?
>>
>
> Fair enough. I incorporated the check here so it disappears from the code 
> entirely when !CONFIG_PREEMPT, because otherwise, you end up with a sequence 
> that is mispredicted every # iterations without any benefit.
> I guess i could macroise that separately though.
>
>> If we had
>>
>>if_cond_yield_neon
>>// patchup code
>>endif_yield_neon
>>

I like this, but ...

>> then the caller is free to conditionally branch over that as appropriate
>> like
>>
>> loop:
>>// crypto stuff
>>tst x0, #0xf
>>b.neloop
>>
>>if_cond_yield_neon
>>// patchup code
>>endif_cond_yield_neon
>>

I need to put the post patchup code somewhere too. Please refer to the
CRC patches for the best examples of this.


>>bloop
>>
>> I think this is clearer than burying checks and branches in a macro that
>> is trying to be generic.
>>
>
> Agreed.
>
>> Label arguments can be added to elide some branches of course, at a
>> corresponding cost to clarity...  in the common case the cache will
>> be hot and the branches won't be mispredicted though.  Is it really
>> worth it?
>>
>
> Perhaps not. And i have not made any attempts yet to benchmark at great 
> detail, given that i need some feedback from the rt crowd first whether this 
> is likely to work as desired.
>
>>> +
>>> +get_thread_infox0
>>> +ldrw1, [x0, #TSK_TI_PREEMPT]
>>> +ldrx0, [x0, #TSK_TI_FLAGS]
>>> +cmpw1, #1 // == PREEMPT_OFFSET
>>
>> asm-offsets?
>>
>
> This is not an offset in that regard, but the header that defines it is not 
> asm safe
>
>> [...]
>>
>> Cheers
>> ---Dave


Re: [PATCH] crypto: exynos - Icrease the priority of the driver

2017-12-05 Thread Krzysztof Kozlowski
On Tue, Dec 05, 2017 at 05:20:46PM +0100, Łukasz Stelmach wrote:
> exynos-rng is one of many implementations of stdrng. With priority as
> low as 100 it isn't selected, if software implementations (DRBG) are
> available. The value 300 was selected to give the PRNG priority before
> software implementations, but allow them to be selected in FIPS-mode
> (fips=1 in the kernel command line).

Typo in subject ("Increase").

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof



Re: [PATCH 2/3] crypto: exynos - Improve performance of PRNG

2017-12-05 Thread Krzysztof Kozlowski
On Tue, Dec 05, 2017 at 05:43:10PM +0100, Łukasz Stelmach wrote:
> It was <2017-12-05 wto 14:54>, when Stephan Mueller wrote:
> > Am Dienstag, 5. Dezember 2017, 13:35:57 CET schrieb Łukasz Stelmach:
> >
> > Hi Łukasz,
> >
> >> Use memcpy_fromio() instead of custom exynos_rng_copy_random() function
> >> to retrieve generated numbers from the registers of PRNG.
> >> 
> >> Remove unnecessary invocation of cpu_relax().
> >> 
> >> Signed-off-by: Łukasz Stelmach 
> >> ---
> >>  drivers/crypto/exynos-rng.c | 36 +---
> >>  1 file changed, 5 insertions(+), 31 deletions(-)
> >> 
> >> diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
> >> index 894ef93ef5ec..002e9d2a83cc 100644
> >> --- a/drivers/crypto/exynos-rng.c
> >> +++ b/drivers/crypto/exynos-rng.c
> 
> [...]
> 
> >> @@ -171,6 +143,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev
> >> *rng, {
> >>int retry = EXYNOS_RNG_WAIT_RETRIES;
> >> 
> >> +  *read = min_t(size_t, dlen, EXYNOS_RNG_SEED_SIZE);
> >> +
> >>if (rng->type == EXYNOS_PRNG_TYPE4) {
> >>exynos_rng_writel(rng, EXYNOS_RNG_CONTROL_START,
> >>  EXYNOS_RNG_CONTROL);
> >> @@ -180,8 +154,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev
> >> *rng, }
> >> 
> >>while (!(exynos_rng_readl(rng,
> >> -  EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) && 
> >> --retry)
> >> -  cpu_relax();
> >> +  EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) &&
> >> + --retry);
> SM>
> SM> Is this related to the patch?
> 
> KK> It looks like unrelated change so split it into separate commit with
> KK> explanation why you are changing the common busy-loop pattern.
> KK> exynos_rng_readl() uses relaxed versions of readl() so I would expect
> KK> here cpu_relax().
> 
> Yes. As far as I can tell this gives the major part of the performance
> improvement brought by this patch.

In that case definitely split and explain... what and why you want to
achieve here.

> 
> The busy loop is not very busy. Every time I checked the loop (w/o
> cpu_relax()) was executed twice (retry was 98) and the operation was
> reliable. I don't see why do we need a memory barrier here. On the other
> hand, I am not sure the whole exynos_rng_get_random() shouldn't be ran
> under a mutex or a spinlock (I don't see anything like this in the upper
> layers of the crypto framework).
> 
> The *_relaxed() I/O operations do not enforce memory

The cpu_relax() is a common pattern for busy-loop. If you want to break
this pattern - please explain why only this part of kernel should not
follow it (and rest of kernel should).

The other part - this code is already using relaxed versions which might
get you into difficult to debug issues. You mentioned that loop works
reliable after removing the cpu_relax... yeah, it might for 99.999% but
that's not the argument. I remember few emails from Arnd Bergmann
mentioning explicitly to avoid using relaxed versions "just because",
unless it is necessary or really understood.

The code first writes to control register, then checks for status so you
should have these operations strictly ordered. Therefore I think
cpu_relax() should not be removed.

Best regards,
Krzysztof



Re: [PATCH 2/3] crypto: exynos - Improve performance of PRNG

2017-12-05 Thread Łukasz Stelmach
It was <2017-12-05 wto 14:54>, when Stephan Mueller wrote:
> Am Dienstag, 5. Dezember 2017, 13:35:57 CET schrieb Łukasz Stelmach:
>
> Hi Łukasz,
>
>> Use memcpy_fromio() instead of custom exynos_rng_copy_random() function
>> to retrieve generated numbers from the registers of PRNG.
>> 
>> Remove unnecessary invocation of cpu_relax().
>> 
>> Signed-off-by: Łukasz Stelmach 
>> ---
>>  drivers/crypto/exynos-rng.c | 36 +---
>>  1 file changed, 5 insertions(+), 31 deletions(-)
>> 
>> diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
>> index 894ef93ef5ec..002e9d2a83cc 100644
>> --- a/drivers/crypto/exynos-rng.c
>> +++ b/drivers/crypto/exynos-rng.c

[...]

>> @@ -171,6 +143,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev
>> *rng, {
>>  int retry = EXYNOS_RNG_WAIT_RETRIES;
>> 
>> +*read = min_t(size_t, dlen, EXYNOS_RNG_SEED_SIZE);
>> +
>>  if (rng->type == EXYNOS_PRNG_TYPE4) {
>>  exynos_rng_writel(rng, EXYNOS_RNG_CONTROL_START,
>>EXYNOS_RNG_CONTROL);
>> @@ -180,8 +154,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev
>> *rng, }
>> 
>>  while (!(exynos_rng_readl(rng,
>> -EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) && 
>> --retry)
>> -cpu_relax();
>> +EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) &&
>> +   --retry);
SM>
SM> Is this related to the patch?

KK> It looks like unrelated change so split it into separate commit with
KK> explanation why you are changing the common busy-loop pattern.
KK> exynos_rng_readl() uses relaxed versions of readl() so I would expect
KK> here cpu_relax().

Yes. As far as I can tell this gives the major part of the performance
improvement brought by this patch.

The busy loop is not very busy. Every time I checked the loop (w/o
cpu_relax()) was executed twice (retry was 98) and the operation was
reliable. I don't see why do we need a memory barrier here. On the other
hand, I am not sure the whole exynos_rng_get_random() shouldn't be ran
under a mutex or a spinlock (I don't see anything like this in the upper
layers of the crypto framework).

The *_relaxed() I/O operations do not enforce memory 

Thank you for asking the questions. I will put the above explanations in
the commit message.

>> 
>>  if (!retry)
>>  return -ETIMEDOUT;
>> @@ -189,7 +163,7 @@ static int exynos_rng_get_random(struct exynos_rng_dev
>> *rng, /* Clear status bit */
>>  exynos_rng_writel(rng, EXYNOS_RNG_STATUS_RNG_DONE,
>>EXYNOS_RNG_STATUS);
>> -*read = exynos_rng_copy_random(rng, dst, dlen);
>> +memcpy_fromio(dst, rng->mem + EXYNOS_RNG_OUT_BASE, *read);
>> 
>>  return 0;
>>  }

Kind regards,
-- 
Łukasz Stelmach
Samsung R Institute Poland
Samsung Electronics


signature.asc
Description: PGP signature


Re: [PATCH] crypto: exynos - Icrease the priority of the driver

2017-12-05 Thread Stephan Mueller
Am Dienstag, 5. Dezember 2017, 17:20:46 CET schrieb Łukasz Stelmach:

Hi Łukasz,

> exynos-rng is one of many implementations of stdrng. With priority as
> low as 100 it isn't selected, if software implementations (DRBG) are
> available. The value 300 was selected to give the PRNG priority before
> software implementations, but allow them to be selected in FIPS-mode
> (fips=1 in the kernel command line).
> 
> Signed-off-by: Łukasz Stelmach 

Reviewed-by: Stephan Mueller 

Ciao
Stephan


[PATCH] crypto: exynos - Icrease the priority of the driver

2017-12-05 Thread Łukasz Stelmach
exynos-rng is one of many implementations of stdrng. With priority as
low as 100 it isn't selected, if software implementations (DRBG) are
available. The value 300 was selected to give the PRNG priority before
software implementations, but allow them to be selected in FIPS-mode
(fips=1 in the kernel command line).

Signed-off-by: Łukasz Stelmach 
---

Thank you Stephan Mueller for explanations.

 drivers/crypto/exynos-rng.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
index 4b2ed1d178af..5a37397fb1c5 100644
--- a/drivers/crypto/exynos-rng.c
+++ b/drivers/crypto/exynos-rng.c
@@ -256,7 +256,7 @@ static struct rng_alg exynos_rng_alg = {
.base   = {
.cra_name   = "stdrng",
.cra_driver_name= "exynos_rng",
-   .cra_priority   = 100,
+   .cra_priority   = 300,
.cra_ctxsize= sizeof(struct exynos_rng_ctx),
.cra_module = THIS_MODULE,
.cra_init   = exynos_rng_kcapi_init,
-- 
2.11.0



Re: [crypto 4/8] chtls: CPL handler definition

2017-12-05 Thread Hannes Frederic Sowa
Hello,

On Tue, Dec 5, 2017, at 12:40, Atul Gupta wrote:
> CPL handlers for TLS session, record transmit and receive

This does very much looks like full TCP offload with TLS on top? It
would be nice if you could give a few more details in the patch
descriptions.

Bye,
Hannes


Re: [PATCH 4.9-stable] Revert "crypto: caam - get rid of tasklet"

2017-12-05 Thread Greg KH
On Tue, Dec 05, 2017 at 05:37:44PM +0200, Horia Geantă wrote:
> commit 2b163b5bce04546da72617bfb6c8bf07a45c4b17 upstream.

Now queued up, thanks!

greg k-h


[PATCH 4.9-stable] Revert "crypto: caam - get rid of tasklet"

2017-12-05 Thread Horia Geantă
commit 2b163b5bce04546da72617bfb6c8bf07a45c4b17 upstream.

This reverts commit 66d2e2028091a074aa1290d2eeda5ddb1a6c329c.

Quoting from Russell's findings:
https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg21136.html

[quote]
Okay, I've re-tested, using a different way of measuring, because using
openssl speed is impractical for off-loaded engines.  I've decided to
use this way to measure the performance:

dd if=/dev/zero bs=1048576 count=128 | /usr/bin/time openssl dgst -md5

For the threaded IRQs case gives:

0.05user 2.74system 0:05.30elapsed 52%CPU (0avgtext+0avgdata 2400maxresident)k
0.06user 2.52system 0:05.18elapsed 49%CPU (0avgtext+0avgdata 2404maxresident)k
0.12user 2.60system 0:05.61elapsed 48%CPU (0avgtext+0avgdata 2460maxresident)k
=> 5.36s => 25.0MB/s

and the tasklet case:

0.08user 2.53system 0:04.83elapsed 54%CPU (0avgtext+0avgdata 2468maxresident)k
0.09user 2.47system 0:05.16elapsed 49%CPU (0avgtext+0avgdata 2368maxresident)k
0.10user 2.51system 0:04.87elapsed 53%CPU (0avgtext+0avgdata 2460maxresident)k
=> 4.95 => 27.1MB/s

which corresponds to an 8% slowdown for the threaded IRQ case.  So,
tasklets are indeed faster than threaded IRQs.

[...]

I think I've proven from the above that this patch needs to be reverted
due to the performance regression, and that there _is_ most definitely
a deterimental effect of switching from tasklets to threaded IRQs.
[/quote]

Signed-off-by: Horia Geantă 
Signed-off-by: Herbert Xu 
---

Mihai Ordean reported soft lockups at IPsec ESP high rates on i.MX6Q,
on kernels 4.9.{35,36}.
This patch, cherry-picked from 4.10, fixes the issue.

 drivers/crypto/caam/intern.h |  1 +
 drivers/crypto/caam/jr.c | 25 -
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/crypto/caam/intern.h b/drivers/crypto/caam/intern.h
index 5d4c05074a5c..e2bcacc1a921 100644
--- a/drivers/crypto/caam/intern.h
+++ b/drivers/crypto/caam/intern.h
@@ -41,6 +41,7 @@ struct caam_drv_private_jr {
struct device   *dev;
int ridx;
struct caam_job_ring __iomem *rregs;/* JobR's register space */
+   struct tasklet_struct irqtask;
int irq;/* One per queue */
 
/* Number of scatterlist crypt transforms active on the JobR */
diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c
index 757c27f9953d..9e7f28122bb7 100644
--- a/drivers/crypto/caam/jr.c
+++ b/drivers/crypto/caam/jr.c
@@ -73,6 +73,8 @@ static int caam_jr_shutdown(struct device *dev)
 
ret = caam_reset_hw_jr(dev);
 
+   tasklet_kill(>irqtask);
+
/* Release interrupt */
free_irq(jrp->irq, dev);
 
@@ -128,7 +130,7 @@ static irqreturn_t caam_jr_interrupt(int irq, void *st_dev)
 
/*
 * Check the output ring for ready responses, kick
-* the threaded irq if jobs done.
+* tasklet if jobs done.
 */
irqstate = rd_reg32(>rregs->jrintstatus);
if (!irqstate)
@@ -150,13 +152,18 @@ static irqreturn_t caam_jr_interrupt(int irq, void 
*st_dev)
/* Have valid interrupt at this point, just ACK and trigger */
wr_reg32(>rregs->jrintstatus, irqstate);
 
-   return IRQ_WAKE_THREAD;
+   preempt_disable();
+   tasklet_schedule(>irqtask);
+   preempt_enable();
+
+   return IRQ_HANDLED;
 }
 
-static irqreturn_t caam_jr_threadirq(int irq, void *st_dev)
+/* Deferred service handler, run as interrupt-fired tasklet */
+static void caam_jr_dequeue(unsigned long devarg)
 {
int hw_idx, sw_idx, i, head, tail;
-   struct device *dev = st_dev;
+   struct device *dev = (struct device *)devarg;
struct caam_drv_private_jr *jrp = dev_get_drvdata(dev);
void (*usercall)(struct device *dev, u32 *desc, u32 status, void *arg);
u32 *userdesc, userstatus;
@@ -230,8 +237,6 @@ static irqreturn_t caam_jr_threadirq(int irq, void *st_dev)
 
/* reenable / unmask IRQs */
clrsetbits_32(>rregs->rconfig_lo, JRCFG_IMSK, 0);
-
-   return IRQ_HANDLED;
 }
 
 /**
@@ -389,10 +394,11 @@ static int caam_jr_init(struct device *dev)
 
jrp = dev_get_drvdata(dev);
 
+   tasklet_init(>irqtask, caam_jr_dequeue, (unsigned long)dev);
+
/* Connect job ring interrupt handler. */
-   error = request_threaded_irq(jrp->irq, caam_jr_interrupt,
-caam_jr_threadirq, IRQF_SHARED,
-dev_name(dev), dev);
+   error = request_irq(jrp->irq, caam_jr_interrupt, IRQF_SHARED,
+   dev_name(dev), dev);
if (error) {
dev_err(dev, "can't connect JobR %d interrupt (%d)\n",
jrp->ridx, jrp->irq);
@@ -454,6 +460,7 @@ static int caam_jr_init(struct device *dev)
 out_free_irq:
free_irq(jrp->irq, dev);
 out_kill_deq:
+   tasklet_kill(>irqtask);
return error;
 }
 
-- 

Re: [crypto 4/8] chtls: CPL handler definition

2017-12-05 Thread Stefano Brivio
On Tue,  5 Dec 2017 17:10:00 +0530
Atul Gupta  wrote:

> CPL handlers for TLS session, record transmit and receive
> 
> Signed-off-by: Atul Gupta 
> ---
>  drivers/crypto/chelsio/chtls/chtls_cm.c | 2048 
> +++
>  1 file changed, 2048 insertions(+)
>  create mode 100644 drivers/crypto/chelsio/chtls/chtls_cm.c
> 
> diff --git a/drivers/crypto/chelsio/chtls/chtls_cm.c 
> b/drivers/crypto/chelsio/chtls/chtls_cm.c
> new file mode 100644
> index 000..ea1c301
> --- /dev/null
> +++ b/drivers/crypto/chelsio/chtls/chtls_cm.c
> @@ -0,0 +1,2048 @@
> +/*
> + * Copyright (c) 2017 Chelsio Communications, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Written by: Atul Gupta (atul.gu...@chelsio.com)
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "chtls.h"
> +#include "chtls_cm.h"
> +
> +extern struct request_sock_ops chtls_rsk_ops;
> +static void (*tcp_time_wait_p)(struct sock *sk, int state, int timeo);
> +
> +/*
> + * State transitions and actions for close.  Note that if we are in SYN_SENT
> + * we remain in that state as we cannot control a connection while it's in
> + * SYN_SENT; such connections are allowed to establish and are then aborted.
> + */
> +static unsigned char new_state[16] = {
> + /* current state: new state:  action: */
> + /* (Invalid)   */ TCP_CLOSE,
> + /* TCP_ESTABLISHED */ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
> + /* TCP_SYN_SENT*/ TCP_SYN_SENT,
> + /* TCP_SYN_RECV*/ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
> + /* TCP_FIN_WAIT1   */ TCP_FIN_WAIT1,
> + /* TCP_FIN_WAIT2   */ TCP_FIN_WAIT2,
> + /* TCP_TIME_WAIT   */ TCP_CLOSE,
> + /* TCP_CLOSE   */ TCP_CLOSE,
> + /* TCP_CLOSE_WAIT  */ TCP_LAST_ACK | TCP_ACTION_FIN,
> + /* TCP_LAST_ACK*/ TCP_LAST_ACK,
> + /* TCP_LISTEN  */ TCP_CLOSE,
> + /* TCP_CLOSING */ TCP_CLOSING,
> +};
> +
> +static struct chtls_sock *chtls_sock_create(struct chtls_dev *cdev)
> +{
> + struct chtls_sock *csk = kzalloc(sizeof(*csk), GFP_NOIO);
> +
> + if (!csk)
> + return NULL;
> +
> + csk->txdata_skb_cache =  alloc_skb(TXDATA_SKB_LEN, GFP_ATOMIC);

Excess whitespace.

> + if (!csk->txdata_skb_cache) {
> + kfree(csk);
> + return NULL;
> + }
> +
> + kref_init(>kref);
> + csk->cdev = cdev;
> + skb_queue_head_init(>txq);
> + csk->wr_skb_head = NULL;
> + csk->wr_skb_tail = NULL;
> + csk->mss = MAX_MSS;
> + csk->tlshws.ofld = 1;
> + csk->tlshws.txkey = -1;
> + csk->tlshws.rxkey = -1;
> + csk->tlshws.mfs = TLS_MFS;
> + skb_queue_head_init(>tlshws.sk_recv_queue);
> + return csk;
> +}
> +
> +void chtls_sock_release(struct kref *ref)
> +{
> + struct chtls_sock *csk =
> + container_of(ref, struct chtls_sock, kref);
> +
> + kfree(csk);
> +}
> +
> +void get_tcp_symbol(void)
> +{
> + tcp_time_wait_p = (void *)kallsyms_lookup_name("tcp_time_wait");
> + if (!tcp_time_wait_p)
> + pr_info("could not locate tcp_time_wait");

Probably not something that should be used here. Why do you need this?

> +}
> +
> +static struct net_device *chtls_ipv4_netdev(struct chtls_dev *cdev,
> + struct sock *sk)
> +{
> + struct net_device *ndev = cdev->ports[0];
> +
> + if (likely(!inet_sk(sk)->inet_rcv_saddr))
> + return ndev;
> +
> + ndev = ip_dev_find(_net, inet_sk(sk)->inet_rcv_saddr);
> + if (!ndev)
> + return NULL;
> +
> + if (is_vlan_dev(ndev))
> + return vlan_dev_real_dev(ndev);
> + return ndev;
> +}
> +
> +static void assign_rxopt(struct sock *sk, unsigned int opt)
> +{
> + const struct chtls_dev *cdev;
> + struct tcp_sock *tp = tcp_sk(sk);
> + struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);

Reverse christmas tree format?

> +
> + cdev = csk->cdev;
> + tp->tcp_header_len   = sizeof(struct tcphdr);
> + tp->rx_opt.mss_clamp = cdev->mtus[TCPOPT_MSS_G(opt)] - 40;
> + tp->mss_cache= tp->rx_opt.mss_clamp;
> + tp->rx_opt.tstamp_ok = TCPOPT_TSTAMP_G(opt);
> + tp->rx_opt.snd_wscale= TCPOPT_SACK_G(opt);
> + tp->rx_opt.wscale_ok = TCPOPT_WSCALE_OK_G(opt);
> + SND_WSCALE(tp)   = TCPOPT_SND_WSCALE_G(opt);
> + if (!tp->rx_opt.wscale_ok)
> + tp->rx_opt.rcv_wscale = 0;
> + if (tp->rx_opt.tstamp_ok) {
> + tp->tcp_header_len += TCPOLEN_TSTAMP_ALIGNED;
> + tp->rx_opt.mss_clamp -= TCPOLEN_TSTAMP_ALIGNED;
> + } else if (csk->opt2 

Announce loop-AES-v3.7m file/swap crypto package

2017-12-05 Thread Jari Ruusu
loop-AES changes since previous release:
- Worked around kernel interface changes on 4.14 and 4.15-rc kernels.
- Fixed possible timer delete race condition at loop detach time when key
  scrubbing was enabled.


bzip2 compressed tarball is here:

http://loop-aes.sourceforge.net/loop-AES/loop-AES-v3.7m.tar.bz2
md5sum 288105b86f7733224ddd5f7369b6a025

http://loop-aes.sourceforge.net/loop-AES/loop-AES-v3.7m.tar.bz2.sign

-- 
Jari Ruusu  4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD  ACDF F073 3C80 8132 F189


Re: [PATCH 3/3] crypto: exynos - Reseed PRNG after generating 2^16 random bytes

2017-12-05 Thread Krzysztof Kozlowski
On Tue, Dec 5, 2017 at 1:35 PM, Łukasz Stelmach  wrote:
> Reseed PRNG after reading 65 kB of randomness. Although this may reduce
> performance, in most casese the loss is not noticable.
s/casese/cases/
s/noticable/noticeable/

Please explain why you want to reseed after 65 kB (as opposite to
current implementation). Mention also why you are changing the time of
reseed.

>
> Signed-off-by: Łukasz Stelmach 
> ---
>  drivers/crypto/exynos-rng.c | 18 ++
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
> index 002e9d2a83cc..0bf07a655813 100644
> --- a/drivers/crypto/exynos-rng.c
> +++ b/drivers/crypto/exynos-rng.c
> @@ -54,12 +54,15 @@ enum exynos_prng_type {
>  };
>
>  /*
> - * Driver re-seeds itself with generated random numbers to increase
> - * the randomness.
> + * Driver re-seeds itself with generated random numbers to hinder
> + * backtracking of the original seed.
>   *
>   * Time for next re-seed in ms.
>   */
> -#define EXYNOS_RNG_RESEED_TIME 100
> +#define EXYNOS_RNG_RESEED_TIME 1000
> +#define EXYNOS_RNG_RESEED_BYTES65536
> +
> +

Just one empty line.

>  /*
>   * In polling mode, do not wait infinitely for the engine to finish the work.
>   */
> @@ -81,6 +84,8 @@ struct exynos_rng_dev {
> unsigned intseed_save_len;
> /* Time of last seeding in jiffies */
> unsigned long   last_seeding;
> +   /* Bytes generated since last seeding */
> +   unsigned long   bytes_seeding;
>  };
>
>  static struct exynos_rng_dev *exynos_rng_dev;
> @@ -125,6 +130,7 @@ static int exynos_rng_set_seed(struct exynos_rng_dev *rng,
> }
>
> rng->last_seeding = jiffies;
> +   rng->bytes_seeding = 0;
>
> return 0;
>  }
> @@ -166,6 +172,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev 
> *rng,
> memcpy_fromio(dst, rng->mem + EXYNOS_RNG_OUT_BASE, *read);
>
> return 0;
> +
> +

No need for these lines.

Best regards,
Krzysztof


Re: [PATCH 2/3] crypto: exynos - Improve performance of PRNG

2017-12-05 Thread Stephan Mueller
Am Dienstag, 5. Dezember 2017, 13:35:57 CET schrieb Łukasz Stelmach:

Hi Łukasz,

> Use memcpy_fromio() instead of custom exynos_rng_copy_random() function
> to retrieve generated numbers from the registers of PRNG.
> 
> Remove unnecessary invocation of cpu_relax().
> 
> Signed-off-by: Łukasz Stelmach 
> ---
>  drivers/crypto/exynos-rng.c | 36 +---
>  1 file changed, 5 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
> index 894ef93ef5ec..002e9d2a83cc 100644
> --- a/drivers/crypto/exynos-rng.c
> +++ b/drivers/crypto/exynos-rng.c
> @@ -130,34 +130,6 @@ static int exynos_rng_set_seed(struct exynos_rng_dev
> *rng, }
> 
>  /*
> - * Read from output registers and put the data under 'dst' array,
> - * up to dlen bytes.
> - *
> - * Returns number of bytes actually stored in 'dst' (dlen
> - * or EXYNOS_RNG_SEED_SIZE).
> - */
> -static unsigned int exynos_rng_copy_random(struct exynos_rng_dev *rng,
> -u8 *dst, unsigned int dlen)
> -{
> - unsigned int cnt = 0;
> - int i, j;
> - u32 val;
> -
> - for (j = 0; j < EXYNOS_RNG_SEED_REGS; j++) {
> - val = exynos_rng_readl(rng, EXYNOS_RNG_OUT(j));
> -
> - for (i = 0; i < 4; i++) {
> - dst[cnt] = val & 0xff;
> - val >>= 8;
> - if (++cnt >= dlen)
> - return cnt;
> - }
> - }
> -
> - return cnt;
> -}
> -
> -/*
>   * Start the engine and poll for finish.  Then read from output registers
>   * filling the 'dst' buffer up to 'dlen' bytes or up to size of generated
>   * random data (EXYNOS_RNG_SEED_SIZE).
> @@ -171,6 +143,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev
> *rng, {
>   int retry = EXYNOS_RNG_WAIT_RETRIES;
> 
> + *read = min_t(size_t, dlen, EXYNOS_RNG_SEED_SIZE);
> +
>   if (rng->type == EXYNOS_PRNG_TYPE4) {
>   exynos_rng_writel(rng, EXYNOS_RNG_CONTROL_START,
> EXYNOS_RNG_CONTROL);
> @@ -180,8 +154,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev
> *rng, }
> 
>   while (!(exynos_rng_readl(rng,
> - EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) && 
> --retry)
> - cpu_relax();
> + EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) &&
> +--retry);

Is this related to the patch?
> 
>   if (!retry)
>   return -ETIMEDOUT;
> @@ -189,7 +163,7 @@ static int exynos_rng_get_random(struct exynos_rng_dev
> *rng, /* Clear status bit */
>   exynos_rng_writel(rng, EXYNOS_RNG_STATUS_RNG_DONE,
> EXYNOS_RNG_STATUS);
> - *read = exynos_rng_copy_random(rng, dst, dlen);
> + memcpy_fromio(dst, rng->mem + EXYNOS_RNG_OUT_BASE, *read);
> 
>   return 0;
>  }



Ciao
Stephan


Re: [PATCH 3/3] crypto: exynos - Reseed PRNG after generating 2^16 random bytes

2017-12-05 Thread Stephan Mueller
Am Dienstag, 5. Dezember 2017, 13:35:58 CET schrieb Łukasz Stelmach:

Hi Łukasz,

> Reseed PRNG after reading 65 kB of randomness. Although this may reduce
> performance, in most casese the loss is not noticable.

Please add to the log that you also increase the timer-based reseed to 1 
second?!

Another suggestion: maybe you want to add a comment to the reseed function to 
indicate it is for enhanced backtracking resistance. Otherwise a lot of folks 
would scratch their head why such code exists in the first place. :-)

Other than that:

Reviewed-by: Stephan Mueller 

Ciao
Stephan


Re: [PATCH 2/3] crypto: exynos - Improve performance of PRNG

2017-12-05 Thread Krzysztof Kozlowski
On Tue, Dec 5, 2017 at 1:35 PM, Łukasz Stelmach  wrote:
> Use memcpy_fromio() instead of custom exynos_rng_copy_random() function
> to retrieve generated numbers from the registers of PRNG.
>
> Remove unnecessary invocation of cpu_relax().
>
> Signed-off-by: Łukasz Stelmach 
> ---
>  drivers/crypto/exynos-rng.c | 36 +---
>  1 file changed, 5 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
> index 894ef93ef5ec..002e9d2a83cc 100644
> --- a/drivers/crypto/exynos-rng.c
> +++ b/drivers/crypto/exynos-rng.c
> @@ -130,34 +130,6 @@ static int exynos_rng_set_seed(struct exynos_rng_dev 
> *rng,
>  }
>
>  /*
> - * Read from output registers and put the data under 'dst' array,
> - * up to dlen bytes.
> - *
> - * Returns number of bytes actually stored in 'dst' (dlen
> - * or EXYNOS_RNG_SEED_SIZE).
> - */
> -static unsigned int exynos_rng_copy_random(struct exynos_rng_dev *rng,
> -  u8 *dst, unsigned int dlen)
> -{
> -   unsigned int cnt = 0;
> -   int i, j;
> -   u32 val;
> -
> -   for (j = 0; j < EXYNOS_RNG_SEED_REGS; j++) {
> -   val = exynos_rng_readl(rng, EXYNOS_RNG_OUT(j));
> -
> -   for (i = 0; i < 4; i++) {
> -   dst[cnt] = val & 0xff;
> -   val >>= 8;
> -   if (++cnt >= dlen)
> -   return cnt;
> -   }
> -   }
> -
> -   return cnt;
> -}
> -
> -/*
>   * Start the engine and poll for finish.  Then read from output registers
>   * filling the 'dst' buffer up to 'dlen' bytes or up to size of generated
>   * random data (EXYNOS_RNG_SEED_SIZE).
> @@ -171,6 +143,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev 
> *rng,
>  {
> int retry = EXYNOS_RNG_WAIT_RETRIES;
>
> +   *read = min_t(size_t, dlen, EXYNOS_RNG_SEED_SIZE);
> +

Do not set *read on error path. Only on success. Although now it will
not matter but that is the expected behavior - if possible, do not
affect state outside of a block in case of error.

> if (rng->type == EXYNOS_PRNG_TYPE4) {
> exynos_rng_writel(rng, EXYNOS_RNG_CONTROL_START,
>   EXYNOS_RNG_CONTROL);
> @@ -180,8 +154,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev 
> *rng,
> }
>
> while (!(exynos_rng_readl(rng,
> -   EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) && 
> --retry)
> -   cpu_relax();
> +   EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) &&
> +  --retry);

It looks like unrelated change so split it into separate commit with
explanation why you are changing the common busy-loop pattern.
exynos_rng_readl() uses relaxed versions of readl() so I would expect
here cpu_relax().

Best regards,
Krzysztof


Re: [RFC] crypto: exynos - Icrease the priority of the driver

2017-12-05 Thread Stephan Mueller
Am Dienstag, 5. Dezember 2017, 13:42:14 CET schrieb Łukasz Stelmach:

Hi Łukasz,

> exynos-rng is one of many implementations of stdrng. With priority as
> low as 100 it isn't selected, if software implementations (DRBG) are
> available.

What about using 300? The reason is the following: in the normal case, the 
software PRNGs have 100 (X9.31) and 200 (SP800-90A DRBG). Thus, in normal 
case, the hardware takes precedence.

In FIPS mode, the DRBG prio is increased by 200. As in FIPS mode you must have 
a DRBG and assuming that the hardware does not implement a DRBG, the software 
DRBG should be used as otherwise you have a FIPS-problem.

Ciao
Stephan


Re: [PATCH 1/3] crypto: exynos - Support Exynos5250+ SoCs

2017-12-05 Thread Krzysztof Kozlowski
On Tue, Dec 5, 2017 at 1:35 PM, Łukasz Stelmach  wrote:
> Add support for PRNG in Exynos5250+ SoCs.
>
> Signed-off-by: Łukasz Stelmach 
> ---
>  .../bindings/crypto/samsung,exynos-rng4.txt|  4 ++-
>  drivers/crypto/exynos-rng.c| 36 
> --
>  2 files changed, 36 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/crypto/samsung,exynos-rng4.txt 
> b/Documentation/devicetree/bindings/crypto/samsung,exynos-rng4.txt
> index 4ca8dd4d7e66..a13fbdb4bd88 100644
> --- a/Documentation/devicetree/bindings/crypto/samsung,exynos-rng4.txt
> +++ b/Documentation/devicetree/bindings/crypto/samsung,exynos-rng4.txt
> @@ -2,7 +2,9 @@ Exynos Pseudo Random Number Generator
>
>  Required properties:
>
> -- compatible  : Should be "samsung,exynos4-rng".
> +- compatible  : One of:
> +- "samsung,exynos4-rng" for Exynos4210 and Exynos4412
> +- "samsung,exynos5250-prng" for Exynos5250+
>  - reg : Specifies base physical address and size of the registers 
> map.
>  - clocks  : Phandle to clock-controller plus clock-specifier pair.
>  - clock-names : "secss" as a clock name.
> diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
> index 451620b475a0..894ef93ef5ec 100644
> --- a/drivers/crypto/exynos-rng.c
> +++ b/drivers/crypto/exynos-rng.c
> @@ -22,12 +22,17 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include 
>
>  #define EXYNOS_RNG_CONTROL 0x0
>  #define EXYNOS_RNG_STATUS  0x10
> +
> +#define EXYNOS_RNG_SEED_CONF   0x14
> +#define EXYNOS_RNG_GEN_PRNG0x02

Use BIT(1) instead.

> +
>  #define EXYNOS_RNG_SEED_BASE   0x140
>  #define EXYNOS_RNG_SEED(n) (EXYNOS_RNG_SEED_BASE + (n * 0x4))
>  #define EXYNOS_RNG_OUT_BASE0x160
> @@ -43,6 +48,11 @@
>  #define EXYNOS_RNG_SEED_REGS   5
>  #define EXYNOS_RNG_SEED_SIZE   (EXYNOS_RNG_SEED_REGS * 4)
>
> +enum exynos_prng_type {
> +   EXYNOS_PRNG_TYPE4 = 4,
> +   EXYNOS_PRNG_TYPE5 = 5,

That's unusual numbering and naming, so just:
enum exynos_prng_type {
  EXYNOS_PRNG_EXYNOS4,
  EXYNOS_PRNG_EXYNOS5,
};

Especially that TYPE4 and TYPE5 suggest so kind of sub-type (like
versions of some IP blocks, e.g. MFC) but it is just the family of
Exynos.

> +};
> +
>  /*
>   * Driver re-seeds itself with generated random numbers to increase
>   * the randomness.
> @@ -63,6 +73,7 @@ struct exynos_rng_ctx {
>  /* Device associated memory */
>  struct exynos_rng_dev {
> struct device   *dev;
> +   enum exynos_prng_type   type;
> void __iomem*mem;
> struct clk  *clk;
> /* Generated numbers stored for seeding during resume */
> @@ -160,8 +171,13 @@ static int exynos_rng_get_random(struct exynos_rng_dev 
> *rng,
>  {
> int retry = EXYNOS_RNG_WAIT_RETRIES;
>
> -   exynos_rng_writel(rng, EXYNOS_RNG_CONTROL_START,
> - EXYNOS_RNG_CONTROL);
> +   if (rng->type == EXYNOS_PRNG_TYPE4) {
> +   exynos_rng_writel(rng, EXYNOS_RNG_CONTROL_START,
> + EXYNOS_RNG_CONTROL);
> +   } else if (rng->type == EXYNOS_PRNG_TYPE5) {
> +   exynos_rng_writel(rng, EXYNOS_RNG_GEN_PRNG,
> + EXYNOS_RNG_SEED_CONF);
> +   }
>
> while (!(exynos_rng_readl(rng,
> EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) && 
> --retry)
> @@ -279,6 +295,13 @@ static int exynos_rng_probe(struct platform_device *pdev)
> if (!rng)
> return -ENOMEM;
>
> +   rng->type = (enum 
> exynos_prng_type)of_device_get_match_data(>dev);
> +   if (rng->type != EXYNOS_PRNG_TYPE4 &&
> +   rng->type != EXYNOS_PRNG_TYPE5) {
> +   dev_err(>dev, "Unsupported PRNG type: %d", rng->type);
> +   return -ENOTSUPP;
> +   }
> +
> rng->dev = >dev;
> rng->clk = devm_clk_get(>dev, "secss");
> if (IS_ERR(rng->clk)) {
> @@ -300,7 +323,10 @@ static int exynos_rng_probe(struct platform_device *pdev)
> dev_err(>dev,
> "Couldn't register rng crypto alg: %d\n", ret);
> exynos_rng_dev = NULL;
> -   }
> +   } else

Missing {} around else clause. Probably checkpatch should point it.

> +   dev_info(>dev,
> +"Exynos Pseudo Random Number Generator (type:%d)\n",

dev_dbg, this is not that important information to affect the boot time.

Best regards,
Krzysztof

> +rng->type);
>
> return ret;
>  }
> @@ -367,6 +393,10 @@ static SIMPLE_DEV_PM_OPS(exynos_rng_pm_ops, 
> exynos_rng_suspend,
>  static const struct of_device_id exynos_rng_dt_match[] = {
> {
> .compatible = 

Re: [crypto 6/8] chtls: TCB and Key program

2017-12-05 Thread Stephan Mueller
Am Dienstag, 5. Dezember 2017, 12:40:29 CET schrieb Atul Gupta:

Hi Atul,

> program the tx and rx key on chip.
> 
> Signed-off-by: Atul Gupta 
> ---
>  drivers/crypto/chelsio/chtls/chtls_hw.c | 394
>  1 file changed, 394 insertions(+)
>  create mode 100644 drivers/crypto/chelsio/chtls/chtls_hw.c
> 
> diff --git a/drivers/crypto/chelsio/chtls/chtls_hw.c
> b/drivers/crypto/chelsio/chtls/chtls_hw.c new file mode 100644
> index 000..5e65aa2
> --- /dev/null
> +++ b/drivers/crypto/chelsio/chtls/chtls_hw.c
> @@ -0,0 +1,394 @@
> +/*
> + * Copyright (c) 2017 Chelsio Communications, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Written by: Atul Gupta (atul.gu...@chelsio.com)
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "chtls.h"
> +#include "chtls_cm.h"
> +
> +static void __set_tcb_field_direct(struct chtls_sock *csk,
> +struct cpl_set_tcb_field *req, u16 word,
> +u64 mask, u64 val, u8 cookie, int no_reply)
> +{
> + struct ulptx_idata *sc;
> +
> + INIT_TP_WR_CPL(req, CPL_SET_TCB_FIELD, csk->tid);
> + req->wr.wr_mid |= htonl(FW_WR_FLOWID_V(csk->tid));
> + req->reply_ctrl = htons(NO_REPLY_V(no_reply) |
> + QUEUENO_V(csk->rss_qid));
> + req->word_cookie = htons(TCB_WORD(word) | TCB_COOKIE_V(cookie));
> + req->mask = cpu_to_be64(mask);
> + req->val = cpu_to_be64(val);
> + sc = (struct ulptx_idata *)(req + 1);
> + sc->cmd_more = htonl(ULPTX_CMD_V(ULP_TX_SC_NOOP));
> + sc->len = htonl(0);
> +}
> +
> +void __set_tcb_field(struct sock *sk, struct sk_buff *skb, u16 word,
> +  u64 mask, u64 val, u8 cookie, int no_reply)
> +{
> + struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
> + struct cpl_set_tcb_field *req;
> + struct ulptx_idata *sc;
> + unsigned int wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
> +
> + req = (struct cpl_set_tcb_field *)__skb_put(skb, wrlen);
> + __set_tcb_field_direct(csk, req, word, mask, val, cookie, no_reply);
> + set_wr_txq(skb, CPL_PRIORITY_CONTROL, csk->port_id);
> +}
> +
> +static int chtls_set_tcb_field(struct sock *sk, u16 word, u64 mask, u64
> val) +{
> + struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
> + struct sk_buff *skb;
> + struct cpl_set_tcb_field *req;
> + struct ulptx_idata *sc;
> + unsigned int wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
> + unsigned int credits_needed = DIV_ROUND_UP(wrlen, 16);
> +
> + skb = alloc_skb(wrlen, GFP_ATOMIC);
> + if (!skb)
> + return -ENOMEM;
> +
> + __set_tcb_field(sk, skb, word, mask, val, 0, 1);
> + set_queue(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA, sk);
> + csk->wr_credits -= credits_needed;
> + csk->wr_unacked += credits_needed;
> + enqueue_wr(csk, skb);
> + cxgb4_ofld_send(csk->egress_dev, skb);
> + return 0;
> +}
> +
> +/*
> + * Set one of the t_flags bits in the TCB.
> + */
> +int chtls_set_tcb_tflag(struct sock *sk, unsigned int bit_pos, int val)
> +{
> + return chtls_set_tcb_field(sk, 1, 1ULL << bit_pos,
> + val << bit_pos);
> +}
> +
> +static int chtls_set_tcb_keyid(struct sock *sk, int keyid)
> +{
> + return chtls_set_tcb_field(sk, 31, 0xULL, keyid);
> +}
> +
> +static int chtls_set_tcb_seqno(struct sock *sk)
> +{
> + return chtls_set_tcb_field(sk, 28, ~0ULL, 0);
> +}
> +
> +static int chtls_set_tcb_quiesce(struct sock *sk, int val)
> +{
> + return chtls_set_tcb_field(sk, 1, (1ULL << TF_RX_QUIESCE_S),
> +TF_RX_QUIESCE_V(val));
> +}
> +
> +static void *chtls_alloc_mem(unsigned long size)
> +{
> + void *p = kmalloc(size, GFP_KERNEL);
> +
> + if (!p)
> + p = vmalloc(size);
> + if (p)
> + memset(p, 0, size);
> + return p;
> +}
> +
> +static void chtls_free_mem(void *addr)
> +{
> + unsigned long p = (unsigned long)addr;
> +
> + if (p >= VMALLOC_START && p < VMALLOC_END)
> + vfree(addr);
> + else
> + kfree(addr);
> +}
> +
> +/* TLS Key bitmap processing */
> +int chtls_init_kmap(struct chtls_dev *cdev, struct cxgb4_lld_info *lldi)
> +{
> + unsigned int num_key_ctx, bsize;
> +
> + num_key_ctx = (lldi->vr->key.size / TLS_KEY_CONTEXT_SZ);
> + bsize = BITS_TO_LONGS(num_key_ctx);
> +
> + cdev->kmap.size = num_key_ctx;
> + cdev->kmap.available = bsize;
> + cdev->kmap.addr = chtls_alloc_mem(sizeof(*cdev->kmap.addr) *
> +   bsize);
> + if (!cdev->kmap.addr)
> + return -1;
> +
> + 

Re: [PATCH v2 11/19] arm64: assembler: add macro to conditionally yield the NEON under PREEMPT

2017-12-05 Thread Ard Biesheuvel


> On 5 Dec 2017, at 12:28, Dave Martin  wrote:
> 
>> On Mon, Dec 04, 2017 at 12:26:37PM +, Ard Biesheuvel wrote:
>> Add a support macro to conditionally yield the NEON (and thus the CPU)
>> that may be called from the assembler code. Given that especially the
>> instruction based accelerated crypto code may use very tight loops, add
>> some parametrization so that the TIF_NEED_RESCHED flag test is only
>> executed every so many loop iterations.
>> 
>> In some cases, yielding the NEON involves saving and restoring a non
>> trivial amount of context (especially in the CRC folding algorithms),
>> and so the macro is split into two, and the code in between is only
>> executed when the yield path is taken, allowing the contex to be preserved.
>> The second macro takes a label argument that marks the resume-from-yield
>> path, which should restore the preserved context again.
>> 
>> Signed-off-by: Ard Biesheuvel 
>> ---
>> arch/arm64/include/asm/assembler.h | 50 
>> 1 file changed, 50 insertions(+)
>> 
>> diff --git a/arch/arm64/include/asm/assembler.h 
>> b/arch/arm64/include/asm/assembler.h
>> index aef72d886677..917b026d3e00 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -512,4 +512,54 @@ alternative_else_nop_endif
>> #endif
>>.endm
>> 
>> +/*
>> + * yield_neon - check whether to yield to another runnable task from
>> + *kernel mode NEON code (running with preemption disabled)
>> + *
>> + * - Check whether the preempt count is exactly 1, in which case disabling
>> + *   preemption once will make the task preemptible. If this is not the 
>> case,
>> + *   yielding is pointless.
>> + * - Check whether TIF_NEED_RESCHED is set, and if so, disable and re-enable
>> + *   kernel mode NEON (which will trigger a reschedule), and branch to the
>> + *   yield fixup code at @lbl.
>> + */
>> +.macroyield_neon, lbl:req, ctr, order, stride, loop
>> +yield_neon_pre\ctr, \order, \stride, \loop
>> +yield_neon_post\lbl
>> +.endm
>> +
>> +.macroyield_neon_pre, ctr, order=0, stride, loop=f
>> +#ifdef CONFIG_PREEMPT
>> +/*
>> + * With some algorithms, it makes little sense to poll the
>> + * TIF_NEED_RESCHED flag after every iteration, so only perform
>> + * the check every 2^order strides.
>> + */
>> +.if\order > 1
>> +.if(\stride & (\stride - 1)) != 0
>> +.error"stride should be a power of 2"
>> +.endif
>> +tst\ctr, #((1 << \order) * \stride - 1) & ~(\stride - 1)
>> +b.ne\loop
>> +.endif
> 
> I'm not sure what baking in this check gives us, and this seems to
> conflate two rather different things: yielding and defining a
> "heartbeat" frequency for the calling code.
> 
> Can we separate out the crypto-loop-helper semantics from the yield-
> neon stuff?
> 

Fair enough. I incorporated the check here so it disappears from the code 
entirely when !CONFIG_PREEMPT, because otherwise, you end up with a sequence 
that is mispredicted every # iterations without any benefit.
I guess i could macroise that separately though.

> If we had
> 
>if_cond_yield_neon
>// patchup code
>endif_yield_neon
> 
> then the caller is free to conditionally branch over that as appropriate
> like
> 
> loop:
>// crypto stuff
>tst x0, #0xf
>b.neloop
> 
>if_cond_yield_neon
>// patchup code
>endif_cond_yield_neon
> 
>bloop
> 
> I think this is clearer than burying checks and branches in a macro that
> is trying to be generic.
> 

Agreed.

> Label arguments can be added to elide some branches of course, at a
> corresponding cost to clarity...  in the common case the cache will
> be hot and the branches won't be mispredicted though.  Is it really
> worth it?
> 

Perhaps not. And i have not made any attempts yet to benchmark at great detail, 
given that i need some feedback from the rt crowd first whether this is likely 
to work as desired.

>> +
>> +get_thread_infox0
>> +ldrw1, [x0, #TSK_TI_PREEMPT]
>> +ldrx0, [x0, #TSK_TI_FLAGS]
>> +cmpw1, #1 // == PREEMPT_OFFSET
> 
> asm-offsets?
> 

This is not an offset in that regard, but the header that defines it is not asm 
safe

> [...]
> 
> Cheers
> ---Dave


[RFC] crypto: exynos - Icrease the priority of the driver

2017-12-05 Thread Łukasz Stelmach
exynos-rng is one of many implementations of stdrng. With priority as
low as 100 it isn't selected, if software implementations (DRBG) are
available.

Signed-off-by: Łukasz Stelmach 
---

If not 1000, what is the best value, what is the policy?


 drivers/crypto/exynos-rng.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
index 0bf07a655813..3c514eaae9dc 100644
--- a/drivers/crypto/exynos-rng.c
+++ b/drivers/crypto/exynos-rng.c
@@ -259,7 +259,7 @@ static struct rng_alg exynos_rng_alg = {
.base   = {
.cra_name   = "stdrng",
.cra_driver_name= "exynos_rng",
-   .cra_priority   = 100,
+   .cra_priority   = 1000,
.cra_ctxsize= sizeof(struct exynos_rng_ctx),
.cra_module = THIS_MODULE,
.cra_init   = exynos_rng_kcapi_init,
-- 
2.11.0



[PATCH 3/3] crypto: exynos - Reseed PRNG after generating 2^16 random bytes

2017-12-05 Thread Łukasz Stelmach
Reseed PRNG after reading 65 kB of randomness. Although this may reduce
performance, in most casese the loss is not noticable.

Signed-off-by: Łukasz Stelmach 
---
 drivers/crypto/exynos-rng.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
index 002e9d2a83cc..0bf07a655813 100644
--- a/drivers/crypto/exynos-rng.c
+++ b/drivers/crypto/exynos-rng.c
@@ -54,12 +54,15 @@ enum exynos_prng_type {
 };
 
 /*
- * Driver re-seeds itself with generated random numbers to increase
- * the randomness.
+ * Driver re-seeds itself with generated random numbers to hinder
+ * backtracking of the original seed.
  *
  * Time for next re-seed in ms.
  */
-#define EXYNOS_RNG_RESEED_TIME 100
+#define EXYNOS_RNG_RESEED_TIME 1000
+#define EXYNOS_RNG_RESEED_BYTES65536
+
+
 /*
  * In polling mode, do not wait infinitely for the engine to finish the work.
  */
@@ -81,6 +84,8 @@ struct exynos_rng_dev {
unsigned intseed_save_len;
/* Time of last seeding in jiffies */
unsigned long   last_seeding;
+   /* Bytes generated since last seeding */
+   unsigned long   bytes_seeding;
 };
 
 static struct exynos_rng_dev *exynos_rng_dev;
@@ -125,6 +130,7 @@ static int exynos_rng_set_seed(struct exynos_rng_dev *rng,
}
 
rng->last_seeding = jiffies;
+   rng->bytes_seeding = 0;
 
return 0;
 }
@@ -166,6 +172,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev *rng,
memcpy_fromio(dst, rng->mem + EXYNOS_RNG_OUT_BASE, *read);
 
return 0;
+
+
 }
 
 /* Re-seed itself from time to time */
@@ -177,7 +185,8 @@ static void exynos_rng_reseed(struct exynos_rng_dev *rng)
unsigned int read = 0;
u8 seed[EXYNOS_RNG_SEED_SIZE];
 
-   if (time_before(now, next_seeding))
+   if (time_before(now, next_seeding) &&
+   rng->bytes_seeding < EXYNOS_RNG_RESEED_BYTES)
return;
 
if (exynos_rng_get_random(rng, seed, sizeof(seed), ))
@@ -206,6 +215,7 @@ static int exynos_rng_generate(struct crypto_rng *tfm,
 
dlen -= read;
dst += read;
+   rng->bytes_seeding += read;
 
exynos_rng_reseed(rng);
} while (dlen > 0);
-- 
2.11.0



[PATCH 0/3] Assorted changes for Exynos PRNG driver

2017-12-05 Thread Łukasz Stelmach
Hello,

This is a series of patches for exynos-rng driver I've decided to
create after adding support for Exynos5250+ chips. They do not
strictly depend on each other, but I think it is better to send them
as a single patch-set.

Patch #1 Add support for PRNG in Exynos5250+ SoCs

Patch #2 Improve output performance by using memcpy() rather than a
custom function to retriev random bytes from registers.

Patch #3 Reseed the PRNG after reading 2^16 bytes. Simmilar approach
is implemented in DRBG. (Thanks Stephan Mueller)

Łukasz Stelmach (3):
  crypto: exynos - Support Exynos5250+ SoCs
  crypto: exynos - Improve performance of PRNG
  crypto: exynos - Reseed PRNG after generating 2^16 random bytes

 .../bindings/crypto/samsung,exynos-rng4.txt|  4 +-
 drivers/crypto/exynos-rng.c| 90 +-
 2 files changed, 55 insertions(+), 39 deletions(-)

-- 
2.11.0



[PATCH 2/3] crypto: exynos - Improve performance of PRNG

2017-12-05 Thread Łukasz Stelmach
Use memcpy_fromio() instead of custom exynos_rng_copy_random() function
to retrieve generated numbers from the registers of PRNG.

Remove unnecessary invocation of cpu_relax().

Signed-off-by: Łukasz Stelmach 
---
 drivers/crypto/exynos-rng.c | 36 +---
 1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
index 894ef93ef5ec..002e9d2a83cc 100644
--- a/drivers/crypto/exynos-rng.c
+++ b/drivers/crypto/exynos-rng.c
@@ -130,34 +130,6 @@ static int exynos_rng_set_seed(struct exynos_rng_dev *rng,
 }
 
 /*
- * Read from output registers and put the data under 'dst' array,
- * up to dlen bytes.
- *
- * Returns number of bytes actually stored in 'dst' (dlen
- * or EXYNOS_RNG_SEED_SIZE).
- */
-static unsigned int exynos_rng_copy_random(struct exynos_rng_dev *rng,
-  u8 *dst, unsigned int dlen)
-{
-   unsigned int cnt = 0;
-   int i, j;
-   u32 val;
-
-   for (j = 0; j < EXYNOS_RNG_SEED_REGS; j++) {
-   val = exynos_rng_readl(rng, EXYNOS_RNG_OUT(j));
-
-   for (i = 0; i < 4; i++) {
-   dst[cnt] = val & 0xff;
-   val >>= 8;
-   if (++cnt >= dlen)
-   return cnt;
-   }
-   }
-
-   return cnt;
-}
-
-/*
  * Start the engine and poll for finish.  Then read from output registers
  * filling the 'dst' buffer up to 'dlen' bytes or up to size of generated
  * random data (EXYNOS_RNG_SEED_SIZE).
@@ -171,6 +143,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev *rng,
 {
int retry = EXYNOS_RNG_WAIT_RETRIES;
 
+   *read = min_t(size_t, dlen, EXYNOS_RNG_SEED_SIZE);
+
if (rng->type == EXYNOS_PRNG_TYPE4) {
exynos_rng_writel(rng, EXYNOS_RNG_CONTROL_START,
  EXYNOS_RNG_CONTROL);
@@ -180,8 +154,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev *rng,
}
 
while (!(exynos_rng_readl(rng,
-   EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) && 
--retry)
-   cpu_relax();
+   EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) &&
+  --retry);
 
if (!retry)
return -ETIMEDOUT;
@@ -189,7 +163,7 @@ static int exynos_rng_get_random(struct exynos_rng_dev *rng,
/* Clear status bit */
exynos_rng_writel(rng, EXYNOS_RNG_STATUS_RNG_DONE,
  EXYNOS_RNG_STATUS);
-   *read = exynos_rng_copy_random(rng, dst, dlen);
+   memcpy_fromio(dst, rng->mem + EXYNOS_RNG_OUT_BASE, *read);
 
return 0;
 }
-- 
2.11.0



[PATCH 1/3] crypto: exynos - Support Exynos5250+ SoCs

2017-12-05 Thread Łukasz Stelmach
Add support for PRNG in Exynos5250+ SoCs.

Signed-off-by: Łukasz Stelmach 
---
 .../bindings/crypto/samsung,exynos-rng4.txt|  4 ++-
 drivers/crypto/exynos-rng.c| 36 --
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/crypto/samsung,exynos-rng4.txt 
b/Documentation/devicetree/bindings/crypto/samsung,exynos-rng4.txt
index 4ca8dd4d7e66..a13fbdb4bd88 100644
--- a/Documentation/devicetree/bindings/crypto/samsung,exynos-rng4.txt
+++ b/Documentation/devicetree/bindings/crypto/samsung,exynos-rng4.txt
@@ -2,7 +2,9 @@ Exynos Pseudo Random Number Generator
 
 Required properties:
 
-- compatible  : Should be "samsung,exynos4-rng".
+- compatible  : One of:
+- "samsung,exynos4-rng" for Exynos4210 and Exynos4412
+- "samsung,exynos5250-prng" for Exynos5250+
 - reg : Specifies base physical address and size of the registers map.
 - clocks  : Phandle to clock-controller plus clock-specifier pair.
 - clock-names : "secss" as a clock name.
diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
index 451620b475a0..894ef93ef5ec 100644
--- a/drivers/crypto/exynos-rng.c
+++ b/drivers/crypto/exynos-rng.c
@@ -22,12 +22,17 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
 
 #define EXYNOS_RNG_CONTROL 0x0
 #define EXYNOS_RNG_STATUS  0x10
+
+#define EXYNOS_RNG_SEED_CONF   0x14
+#define EXYNOS_RNG_GEN_PRNG0x02
+
 #define EXYNOS_RNG_SEED_BASE   0x140
 #define EXYNOS_RNG_SEED(n) (EXYNOS_RNG_SEED_BASE + (n * 0x4))
 #define EXYNOS_RNG_OUT_BASE0x160
@@ -43,6 +48,11 @@
 #define EXYNOS_RNG_SEED_REGS   5
 #define EXYNOS_RNG_SEED_SIZE   (EXYNOS_RNG_SEED_REGS * 4)
 
+enum exynos_prng_type {
+   EXYNOS_PRNG_TYPE4 = 4,
+   EXYNOS_PRNG_TYPE5 = 5,
+};
+
 /*
  * Driver re-seeds itself with generated random numbers to increase
  * the randomness.
@@ -63,6 +73,7 @@ struct exynos_rng_ctx {
 /* Device associated memory */
 struct exynos_rng_dev {
struct device   *dev;
+   enum exynos_prng_type   type;
void __iomem*mem;
struct clk  *clk;
/* Generated numbers stored for seeding during resume */
@@ -160,8 +171,13 @@ static int exynos_rng_get_random(struct exynos_rng_dev 
*rng,
 {
int retry = EXYNOS_RNG_WAIT_RETRIES;
 
-   exynos_rng_writel(rng, EXYNOS_RNG_CONTROL_START,
- EXYNOS_RNG_CONTROL);
+   if (rng->type == EXYNOS_PRNG_TYPE4) {
+   exynos_rng_writel(rng, EXYNOS_RNG_CONTROL_START,
+ EXYNOS_RNG_CONTROL);
+   } else if (rng->type == EXYNOS_PRNG_TYPE5) {
+   exynos_rng_writel(rng, EXYNOS_RNG_GEN_PRNG,
+ EXYNOS_RNG_SEED_CONF);
+   }
 
while (!(exynos_rng_readl(rng,
EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) && 
--retry)
@@ -279,6 +295,13 @@ static int exynos_rng_probe(struct platform_device *pdev)
if (!rng)
return -ENOMEM;
 
+   rng->type = (enum exynos_prng_type)of_device_get_match_data(>dev);
+   if (rng->type != EXYNOS_PRNG_TYPE4 &&
+   rng->type != EXYNOS_PRNG_TYPE5) {
+   dev_err(>dev, "Unsupported PRNG type: %d", rng->type);
+   return -ENOTSUPP;
+   }
+
rng->dev = >dev;
rng->clk = devm_clk_get(>dev, "secss");
if (IS_ERR(rng->clk)) {
@@ -300,7 +323,10 @@ static int exynos_rng_probe(struct platform_device *pdev)
dev_err(>dev,
"Couldn't register rng crypto alg: %d\n", ret);
exynos_rng_dev = NULL;
-   }
+   } else
+   dev_info(>dev,
+"Exynos Pseudo Random Number Generator (type:%d)\n",
+rng->type);
 
return ret;
 }
@@ -367,6 +393,10 @@ static SIMPLE_DEV_PM_OPS(exynos_rng_pm_ops, 
exynos_rng_suspend,
 static const struct of_device_id exynos_rng_dt_match[] = {
{
.compatible = "samsung,exynos4-rng",
+   .data = (const void *)EXYNOS_PRNG_TYPE4,
+   }, {
+   .compatible = "samsung,exynos5250-prng",
+   .data = (const void *)EXYNOS_PRNG_TYPE5,
},
{ },
 };
-- 
2.11.0



Re: [PATCH v2 11/19] arm64: assembler: add macro to conditionally yield the NEON under PREEMPT

2017-12-05 Thread Dave Martin
On Mon, Dec 04, 2017 at 12:26:37PM +, Ard Biesheuvel wrote:
> Add a support macro to conditionally yield the NEON (and thus the CPU)
> that may be called from the assembler code. Given that especially the
> instruction based accelerated crypto code may use very tight loops, add
> some parametrization so that the TIF_NEED_RESCHED flag test is only
> executed every so many loop iterations.
> 
> In some cases, yielding the NEON involves saving and restoring a non
> trivial amount of context (especially in the CRC folding algorithms),
> and so the macro is split into two, and the code in between is only
> executed when the yield path is taken, allowing the contex to be preserved.
> The second macro takes a label argument that marks the resume-from-yield
> path, which should restore the preserved context again.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm64/include/asm/assembler.h | 50 
>  1 file changed, 50 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/assembler.h 
> b/arch/arm64/include/asm/assembler.h
> index aef72d886677..917b026d3e00 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -512,4 +512,54 @@ alternative_else_nop_endif
>  #endif
>   .endm
>  
> +/*
> + * yield_neon - check whether to yield to another runnable task from
> + *   kernel mode NEON code (running with preemption disabled)
> + *
> + * - Check whether the preempt count is exactly 1, in which case disabling
> + *   preemption once will make the task preemptible. If this is not the case,
> + *   yielding is pointless.
> + * - Check whether TIF_NEED_RESCHED is set, and if so, disable and re-enable
> + *   kernel mode NEON (which will trigger a reschedule), and branch to the
> + *   yield fixup code at @lbl.
> + */
> + .macro  yield_neon, lbl:req, ctr, order, stride, loop
> + yield_neon_pre  \ctr, \order, \stride, \loop
> + yield_neon_post \lbl
> + .endm
> +
> + .macro  yield_neon_pre, ctr, order=0, stride, loop=f
> +#ifdef CONFIG_PREEMPT
> + /*
> +  * With some algorithms, it makes little sense to poll the
> +  * TIF_NEED_RESCHED flag after every iteration, so only perform
> +  * the check every 2^order strides.
> +  */
> + .if \order > 1
> + .if (\stride & (\stride - 1)) != 0
> + .error  "stride should be a power of 2"
> + .endif
> + tst \ctr, #((1 << \order) * \stride - 1) & ~(\stride - 1)
> + b.ne\loop
> + .endif

I'm not sure what baking in this check gives us, and this seems to
conflate two rather different things: yielding and defining a
"heartbeat" frequency for the calling code.

Can we separate out the crypto-loop-helper semantics from the yield-
neon stuff?

If we had

if_cond_yield_neon
// patchup code
endif_yield_neon

then the caller is free to conditionally branch over that as appropriate
like

loop:
// crypto stuff
tst x0, #0xf
b.neloop

if_cond_yield_neon
// patchup code
endif_cond_yield_neon

b   loop

I think this is clearer than burying checks and branches in a macro that
is trying to be generic.

Label arguments can be added to elide some branches of course, at a
corresponding cost to clarity...  in the common case the cache will
be hot and the branches won't be mispredicted though.  Is it really
worth it?

> +
> + get_thread_info x0
> + ldr w1, [x0, #TSK_TI_PREEMPT]
> + ldr x0, [x0, #TSK_TI_FLAGS]
> + cmp w1, #1 // == PREEMPT_OFFSET

asm-offsets?

[...]

Cheers
---Dave


[crypto 8/8] Kconfig Makefile

2017-12-05 Thread Atul Gupta
Entry for Inline TLS as another driver dependent on cxgb4 and chcr

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/Kconfig  | 10 ++
 drivers/crypto/chelsio/Makefile |  1 +
 2 files changed, 11 insertions(+)

diff --git a/drivers/crypto/chelsio/Kconfig b/drivers/crypto/chelsio/Kconfig
index 51932c7..686d246 100644
--- a/drivers/crypto/chelsio/Kconfig
+++ b/drivers/crypto/chelsio/Kconfig
@@ -28,3 +28,13 @@ config CHELSIO_IPSEC_INLINE
 default n
 ---help---
   Enable support for IPSec Tx Inline.
+
+config CRYPTO_DEV_CHELSIO_TLS
+tristate "Chelsio Crypto Inline TLS Driver"
+depends on CHELSIO_T4
+select CRYPTO_DEV_CHELSIO
+---help---
+  Support Chelsio Inline TLS with Chelsio crypto accelerator.
+
+  To compile this driver as a module, choose M here: the module
+  will be called chtls.
diff --git a/drivers/crypto/chelsio/Makefile b/drivers/crypto/chelsio/Makefile
index eaecaf1..639e571 100644
--- a/drivers/crypto/chelsio/Makefile
+++ b/drivers/crypto/chelsio/Makefile
@@ -3,3 +3,4 @@ ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4
 obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chcr.o
 chcr-objs :=  chcr_core.o chcr_algo.o
 chcr-$(CONFIG_CHELSIO_IPSEC_INLINE) += chcr_ipsec.o
+obj-$(CONFIG_CRYPTO_DEV_CHELSIO_TLS) += chtls/
-- 
1.8.3.1



[crypto 6/8] chtls: TCB and Key program

2017-12-05 Thread Atul Gupta
program the tx and rx key on chip.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_hw.c | 394 
 1 file changed, 394 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_hw.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_hw.c 
b/drivers/crypto/chelsio/chtls/chtls_hw.c
new file mode 100644
index 000..5e65aa2
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_hw.c
@@ -0,0 +1,394 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+static void __set_tcb_field_direct(struct chtls_sock *csk,
+  struct cpl_set_tcb_field *req, u16 word,
+  u64 mask, u64 val, u8 cookie, int no_reply)
+{
+   struct ulptx_idata *sc;
+
+   INIT_TP_WR_CPL(req, CPL_SET_TCB_FIELD, csk->tid);
+   req->wr.wr_mid |= htonl(FW_WR_FLOWID_V(csk->tid));
+   req->reply_ctrl = htons(NO_REPLY_V(no_reply) |
+   QUEUENO_V(csk->rss_qid));
+   req->word_cookie = htons(TCB_WORD(word) | TCB_COOKIE_V(cookie));
+   req->mask = cpu_to_be64(mask);
+   req->val = cpu_to_be64(val);
+   sc = (struct ulptx_idata *)(req + 1);
+   sc->cmd_more = htonl(ULPTX_CMD_V(ULP_TX_SC_NOOP));
+   sc->len = htonl(0);
+}
+
+void __set_tcb_field(struct sock *sk, struct sk_buff *skb, u16 word,
+u64 mask, u64 val, u8 cookie, int no_reply)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct cpl_set_tcb_field *req;
+   struct ulptx_idata *sc;
+   unsigned int wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
+
+   req = (struct cpl_set_tcb_field *)__skb_put(skb, wrlen);
+   __set_tcb_field_direct(csk, req, word, mask, val, cookie, no_reply);
+   set_wr_txq(skb, CPL_PRIORITY_CONTROL, csk->port_id);
+}
+
+static int chtls_set_tcb_field(struct sock *sk, u16 word, u64 mask, u64 val)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct sk_buff *skb;
+   struct cpl_set_tcb_field *req;
+   struct ulptx_idata *sc;
+   unsigned int wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
+   unsigned int credits_needed = DIV_ROUND_UP(wrlen, 16);
+
+   skb = alloc_skb(wrlen, GFP_ATOMIC);
+   if (!skb)
+   return -ENOMEM;
+
+   __set_tcb_field(sk, skb, word, mask, val, 0, 1);
+   set_queue(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA, sk);
+   csk->wr_credits -= credits_needed;
+   csk->wr_unacked += credits_needed;
+   enqueue_wr(csk, skb);
+   cxgb4_ofld_send(csk->egress_dev, skb);
+   return 0;
+}
+
+/*
+ * Set one of the t_flags bits in the TCB.
+ */
+int chtls_set_tcb_tflag(struct sock *sk, unsigned int bit_pos, int val)
+{
+   return chtls_set_tcb_field(sk, 1, 1ULL << bit_pos,
+   val << bit_pos);
+}
+
+static int chtls_set_tcb_keyid(struct sock *sk, int keyid)
+{
+   return chtls_set_tcb_field(sk, 31, 0xULL, keyid);
+}
+
+static int chtls_set_tcb_seqno(struct sock *sk)
+{
+   return chtls_set_tcb_field(sk, 28, ~0ULL, 0);
+}
+
+static int chtls_set_tcb_quiesce(struct sock *sk, int val)
+{
+   return chtls_set_tcb_field(sk, 1, (1ULL << TF_RX_QUIESCE_S),
+  TF_RX_QUIESCE_V(val));
+}
+
+static void *chtls_alloc_mem(unsigned long size)
+{
+   void *p = kmalloc(size, GFP_KERNEL);
+
+   if (!p)
+   p = vmalloc(size);
+   if (p)
+   memset(p, 0, size);
+   return p;
+}
+
+static void chtls_free_mem(void *addr)
+{
+   unsigned long p = (unsigned long)addr;
+
+   if (p >= VMALLOC_START && p < VMALLOC_END)
+   vfree(addr);
+   else
+   kfree(addr);
+}
+
+/* TLS Key bitmap processing */
+int chtls_init_kmap(struct chtls_dev *cdev, struct cxgb4_lld_info *lldi)
+{
+   unsigned int num_key_ctx, bsize;
+
+   num_key_ctx = (lldi->vr->key.size / TLS_KEY_CONTEXT_SZ);
+   bsize = BITS_TO_LONGS(num_key_ctx);
+
+   cdev->kmap.size = num_key_ctx;
+   cdev->kmap.available = bsize;
+   cdev->kmap.addr = chtls_alloc_mem(sizeof(*cdev->kmap.addr) *
+ bsize);
+   if (!cdev->kmap.addr)
+   return -1;
+
+   cdev->kmap.start = lldi->vr->key.start;
+   spin_lock_init(>kmap.lock);
+   return 0;
+}
+
+void chtls_free_kmap(struct chtls_dev *cdev)
+{
+   if (cdev->kmap.addr)
+   chtls_free_mem(cdev->kmap.addr);
+}
+
+static int get_new_keyid(struct 

[crypto 7/8] chtls: structure and macro definiton

2017-12-05 Thread Atul Gupta
Inline TLS state, connection management. Support macros definition.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/Makefile   |   4 +
 drivers/crypto/chelsio/chtls/chtls.h| 481 
 drivers/crypto/chelsio/chtls/chtls_cm.h | 209 ++
 3 files changed, 694 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/Makefile
 create mode 100644 drivers/crypto/chelsio/chtls/chtls.h
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_cm.h

diff --git a/drivers/crypto/chelsio/chtls/Makefile 
b/drivers/crypto/chelsio/chtls/Makefile
new file mode 100644
index 000..df13795
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/Makefile
@@ -0,0 +1,4 @@
+ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4 -Idrivers/crypto/chelsio/
+
+obj-$(CONFIG_CRYPTO_DEV_CHELSIO_TLS) += chtls.o
+chtls-objs := chtls_main.o chtls_cm.o chtls_io.o chtls_hw.o
diff --git a/drivers/crypto/chelsio/chtls/chtls.h 
b/drivers/crypto/chelsio/chtls/chtls.h
new file mode 100644
index 000..266fef7c
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls.h
@@ -0,0 +1,481 @@
+/*
+ * Copyright (c) 2016 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __CHTLS_H__
+#define __CHTLS_H__
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "t4fw_api.h"
+#include "t4_msg.h"
+#include "cxgb4.h"
+#include "cxgb4_uld.h"
+#include "l2t.h"
+#include "chcr_algo.h"
+#include "chcr_core.h"
+#include "chcr_crypto.h"
+
+#define CIPHER_BLOCK_SIZE   16
+#define MAX_IVS_PAGE256
+#define TLS_KEY_CONTEXT_SZ 64
+#define TLS_HEADER_LENGTH  5
+#define SCMD_CIPH_MODE_AES_GCM  2
+#define GCM_TAG_SIZE16
+#define AEAD_EXPLICIT_DATA_SIZE 8
+/* Any MFS size should work and come from openssl */
+#define TLS_MFS16384
+
+#define SOCK_INLINE (31)
+#define RSS_HDR sizeof(struct rss_header)
+
+enum {
+   CHTLS_KEY_CONTEXT_DSGL,
+   CHTLS_KEY_CONTEXT_IMM,
+   CHTLS_KEY_CONTEXT_DDR,
+};
+
+enum {
+   CHTLS_LISTEN_START,
+   CHTLS_LISTEN_STOP,
+};
+
+/* Flags for return value of CPL message handlers */
+enum {
+   CPL_RET_BUF_DONE = 1,   /* buffer processing done */
+   CPL_RET_BAD_MSG = 2,/* bad CPL message */
+   CPL_RET_UNKNOWN_TID = 4 /* unexpected unknown TID */
+};
+
+#define TLS_RCV_ST_READ_HEADER  0xF0
+#define TLS_RCV_ST_READ_BODY0xF1
+#define TLS_RCV_ST_READ_DONE0xF2
+#define TLS_RCV_ST_READ_NB  0xF3
+
+#define RSPQ_HASH_BITS 5
+#define LISTEN_INFO_HASH_SIZE 32
+struct listen_info {
+   struct listen_info *next;  /* Link to next entry */
+   struct sock *sk;   /* The listening socket */
+   unsigned int stid; /* The server TID */
+};
+
+enum {
+   T4_LISTEN_START_PENDING,
+   T4_LISTEN_STARTED
+};
+
+enum csk_flags {
+   CSK_CALLBACKS_CHKD, /* socket callbacks have been sanitized */
+   CSK_ABORT_REQ_RCVD, /* received one ABORT_REQ_RSS message */
+   CSK_TX_MORE_DATA,   /* sending ULP data; don't set SHOVE bit */
+   CSK_TX_WAIT_IDLE,   /* suspend Tx until in-flight data is ACKed */
+   CSK_ABORT_SHUTDOWN, /* shouldn't send more abort requests */
+   CSK_ABORT_RPL_PENDING,  /* expecting an abort reply */
+   CSK_CLOSE_CON_REQUESTED,/* we've sent a close_conn_req */
+   CSK_TX_DATA_SENT,   /* sent a TX_DATA WR on this connection */
+   CSK_TX_FAILOVER,/* Tx traffic failing over */
+   CSK_UPDATE_RCV_WND, /* Need to update rcv window */
+   CSK_RST_ABORTED,/* outgoing RST was aborted */
+   CSK_TLS_HANDSHK,/* TLS Handshake */
+};
+
+struct listen_ctx {
+   struct sock *lsk;
+   struct chtls_dev *cdev;
+   u32 state;
+};
+
+struct key_map {
+   unsigned long *addr;
+   unsigned int start;
+   unsigned int available;
+   unsigned int size;
+   spinlock_t lock; /* lock for key id request from map */
+} __packed;
+
+struct tls_scmd {
+   __be32 seqno_numivs;
+   __be32 ivgen_hdrlen;
+};
+
+struct chtls_dev {
+   struct list_head list;
+   struct cxgb4_lld_info *lldi;
+   struct pci_dev *pdev;
+   struct listen_info *listen_hash_tab[LISTEN_INFO_HASH_SIZE];
+   spinlock_t listen_lock; /* lock for listen list */
+   struct net_device **ports;
+   struct tid_info *tids;
+   unsigned int pfvf;
+   const unsigned short *mtus;
+
+   spinlock_t aidr_lock cacheline_aligned_in_smp;
+   struct idr aidr; /* ATID id space */
+   struct idr hwtid_idr;
+   struct idr stid_idr;
+
+  

[crypto 4/8] chtls: CPL handler definition

2017-12-05 Thread Atul Gupta
CPL handlers for TLS session, record transmit and receive

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_cm.c | 2048 +++
 1 file changed, 2048 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_cm.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_cm.c 
b/drivers/crypto/chelsio/chtls/chtls_cm.c
new file mode 100644
index 000..ea1c301
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_cm.c
@@ -0,0 +1,2048 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+extern struct request_sock_ops chtls_rsk_ops;
+static void (*tcp_time_wait_p)(struct sock *sk, int state, int timeo);
+
+/*
+ * State transitions and actions for close.  Note that if we are in SYN_SENT
+ * we remain in that state as we cannot control a connection while it's in
+ * SYN_SENT; such connections are allowed to establish and are then aborted.
+ */
+static unsigned char new_state[16] = {
+   /* current state: new state:  action: */
+   /* (Invalid)   */ TCP_CLOSE,
+   /* TCP_ESTABLISHED */ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
+   /* TCP_SYN_SENT*/ TCP_SYN_SENT,
+   /* TCP_SYN_RECV*/ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
+   /* TCP_FIN_WAIT1   */ TCP_FIN_WAIT1,
+   /* TCP_FIN_WAIT2   */ TCP_FIN_WAIT2,
+   /* TCP_TIME_WAIT   */ TCP_CLOSE,
+   /* TCP_CLOSE   */ TCP_CLOSE,
+   /* TCP_CLOSE_WAIT  */ TCP_LAST_ACK | TCP_ACTION_FIN,
+   /* TCP_LAST_ACK*/ TCP_LAST_ACK,
+   /* TCP_LISTEN  */ TCP_CLOSE,
+   /* TCP_CLOSING */ TCP_CLOSING,
+};
+
+static struct chtls_sock *chtls_sock_create(struct chtls_dev *cdev)
+{
+   struct chtls_sock *csk = kzalloc(sizeof(*csk), GFP_NOIO);
+
+   if (!csk)
+   return NULL;
+
+   csk->txdata_skb_cache =  alloc_skb(TXDATA_SKB_LEN, GFP_ATOMIC);
+   if (!csk->txdata_skb_cache) {
+   kfree(csk);
+   return NULL;
+   }
+
+   kref_init(>kref);
+   csk->cdev = cdev;
+   skb_queue_head_init(>txq);
+   csk->wr_skb_head = NULL;
+   csk->wr_skb_tail = NULL;
+   csk->mss = MAX_MSS;
+   csk->tlshws.ofld = 1;
+   csk->tlshws.txkey = -1;
+   csk->tlshws.rxkey = -1;
+   csk->tlshws.mfs = TLS_MFS;
+   skb_queue_head_init(>tlshws.sk_recv_queue);
+   return csk;
+}
+
+void chtls_sock_release(struct kref *ref)
+{
+   struct chtls_sock *csk =
+   container_of(ref, struct chtls_sock, kref);
+
+   kfree(csk);
+}
+
+void get_tcp_symbol(void)
+{
+   tcp_time_wait_p = (void *)kallsyms_lookup_name("tcp_time_wait");
+   if (!tcp_time_wait_p)
+   pr_info("could not locate tcp_time_wait");
+}
+
+static struct net_device *chtls_ipv4_netdev(struct chtls_dev *cdev,
+   struct sock *sk)
+{
+   struct net_device *ndev = cdev->ports[0];
+
+   if (likely(!inet_sk(sk)->inet_rcv_saddr))
+   return ndev;
+
+   ndev = ip_dev_find(_net, inet_sk(sk)->inet_rcv_saddr);
+   if (!ndev)
+   return NULL;
+
+   if (is_vlan_dev(ndev))
+   return vlan_dev_real_dev(ndev);
+   return ndev;
+}
+
+static void assign_rxopt(struct sock *sk, unsigned int opt)
+{
+   const struct chtls_dev *cdev;
+   struct tcp_sock *tp = tcp_sk(sk);
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+
+   cdev = csk->cdev;
+   tp->tcp_header_len   = sizeof(struct tcphdr);
+   tp->rx_opt.mss_clamp = cdev->mtus[TCPOPT_MSS_G(opt)] - 40;
+   tp->mss_cache= tp->rx_opt.mss_clamp;
+   tp->rx_opt.tstamp_ok = TCPOPT_TSTAMP_G(opt);
+   tp->rx_opt.snd_wscale= TCPOPT_SACK_G(opt);
+   tp->rx_opt.wscale_ok = TCPOPT_WSCALE_OK_G(opt);
+   SND_WSCALE(tp)   = TCPOPT_SND_WSCALE_G(opt);
+   if (!tp->rx_opt.wscale_ok)
+   tp->rx_opt.rcv_wscale = 0;
+   if (tp->rx_opt.tstamp_ok) {
+   tp->tcp_header_len += TCPOLEN_TSTAMP_ALIGNED;
+   tp->rx_opt.mss_clamp -= TCPOLEN_TSTAMP_ALIGNED;
+   } else if (csk->opt2 & TSTAMPS_EN_F) {
+   csk->opt2 &= ~TSTAMPS_EN_F;
+   csk->mtu_idx = TCPOPT_MSS_G(opt);
+   }
+}
+
+static void chtls_purge_rcv_queue(struct sock *sk)
+{
+   struct sk_buff *skb;
+
+   while ((skb = __skb_dequeue(>sk_receive_queue)) != NULL) {
+   skb_dst_set(skb, (void *)NULL);
+   kfree_skb(skb);
+   }

[crypto 5/8] chtls: Inline crypto request for Tx.

2017-12-05 Thread Atul Gupta
TLS handler for record transmit and receive.
Create Inline TLS work request

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_io.c | 1866 +++
 1 file changed, 1866 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_io.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_io.c 
b/drivers/crypto/chelsio/chtls/chtls_io.c
new file mode 100644
index 000..b63fb78
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_io.c
@@ -0,0 +1,1866 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+static bool is_tls_hw(struct chtls_sock *csk)
+{
+   return csk->tlshws.ofld;
+}
+
+static bool is_tls_rx(struct chtls_sock *csk)
+{
+   return (csk->tlshws.rxkey >= 0);
+}
+
+static bool is_tls_tx(struct chtls_sock *csk)
+{
+   return (csk->tlshws.txkey >= 0);
+}
+
+static bool is_tls_skb(struct chtls_sock *csk, const struct sk_buff *skb)
+{
+   return (is_tls_hw(csk) && skb_ulp_tls_skb_flags(skb));
+}
+
+static int key_size(void *sk)
+{
+   return 16; /* Key on DDR */
+}
+
+#define ceil(x, y) \
+   ({ unsigned long __x = (x), __y = (y); (__x + __y - 1) / __y; })
+
+static int data_sgl_len(const struct sk_buff *skb)
+{
+   unsigned int cnt;
+
+   cnt = skb_shinfo(skb)->nr_frags;
+   return (sgl_len(cnt) * 8);
+}
+
+static int nos_ivs(struct sock *sk, unsigned int size)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+
+   return ceil(size, csk->tlshws.mfs);
+}
+
+#define TLS_WR_CPL_LEN \
+   (sizeof(struct fw_tlstx_data_wr) + \
+   sizeof(struct cpl_tx_tls_sfo))
+
+static int is_ivs_imm(struct sock *sk, const struct sk_buff *skb)
+{
+   int ivs_size = nos_ivs(sk, skb->len) * CIPHER_BLOCK_SIZE;
+   int hlen = TLS_WR_CPL_LEN + data_sgl_len(skb);
+
+   if ((hlen + key_size(sk) + ivs_size) <
+   MAX_IMM_OFLD_TX_DATA_WR_LEN) {
+   ULP_SKB_CB(skb)->ulp.tls.iv = 1;
+   return 1;
+   }
+   ULP_SKB_CB(skb)->ulp.tls.iv = 0;
+   return 0;
+}
+
+static int max_ivs_size(struct sock *sk, int size)
+{
+   return (nos_ivs(sk, size) * CIPHER_BLOCK_SIZE);
+}
+
+static int ivs_size(struct sock *sk, const struct sk_buff *skb)
+{
+   return (is_ivs_imm(sk, skb) ? (nos_ivs(sk, skb->len) *
+CIPHER_BLOCK_SIZE) : 0);
+}
+
+static int flowc_wr_credits(int nparams, int *flowclenp)
+{
+   int flowclen16, flowclen;
+
+   flowclen = offsetof(struct fw_flowc_wr, mnemval[nparams]);
+   flowclen16 = DIV_ROUND_UP(flowclen, 16);
+   flowclen = flowclen16 * 16;
+
+   if (flowclenp)
+   *flowclenp = flowclen;
+
+   return flowclen16;
+}
+
+struct sk_buff *create_flowc_wr_skb(struct sock *sk,
+   struct fw_flowc_wr *flowc,
+   int flowclen)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct sk_buff *skb;
+
+   skb = alloc_skb(flowclen, GFP_ATOMIC);
+   if (!skb)
+   return NULL;
+
+   memcpy(__skb_put(skb, flowclen), flowc, flowclen);
+   set_queue(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA, sk);
+
+   return skb;
+}
+
+static int send_flowc_wr(struct sock *sk, struct fw_flowc_wr *flowc,
+int flowclen)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct tcp_sock *tp = tcp_sk(sk);
+   bool syn_sent = (sk->sk_state == TCP_SYN_SENT);
+   int flowclen16 = flowclen / 16;
+   struct sk_buff *skb;
+
+   if (csk_flag(sk, CSK_TX_DATA_SENT)) {
+   skb = create_flowc_wr_skb(sk, flowc, flowclen);
+   if (!skb)
+   return -ENOMEM;
+
+   if (syn_sent)
+   __skb_queue_tail(>ooo_queue, skb);
+   else
+   skb_entail(sk, skb,
+  ULPCB_FLAG_NO_HDR | ULPCB_FLAG_NO_APPEND);
+   return 0;
+   }
+
+   if (!syn_sent) {
+   int ret;
+
+   ret = cxgb4_immdata_send(csk->egress_dev,
+csk->txq_idx,
+flowc, flowclen);
+   if (!ret)
+   return flowclen16;
+   }
+   skb = create_flowc_wr_skb(sk, flowc, flowclen);
+   if (!skb)
+   return -ENOMEM;
+   send_or_defer(sk, tp, skb, 0);
+   return flowclen16;
+}
+
+u8 tcp_state_to_flowc_state(u8 state)
+{
+  

[crypto 2/8] chcr: changes to chcr driver

2017-12-05 Thread Atul Gupta
Define the Macro for TLS Key context

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chcr_algo.h | 42 +
 drivers/crypto/chelsio/chcr_core.h | 55 +-
 include/uapi/linux/tls.h   |  1 +
 3 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/chelsio/chcr_algo.h 
b/drivers/crypto/chelsio/chcr_algo.h
index d1673a5..f263cd4 100644
--- a/drivers/crypto/chelsio/chcr_algo.h
+++ b/drivers/crypto/chelsio/chcr_algo.h
@@ -86,6 +86,39 @@
 KEY_CONTEXT_OPAD_PRESENT_M)
 #define KEY_CONTEXT_OPAD_PRESENT_F  KEY_CONTEXT_OPAD_PRESENT_V(1U)
 
+#define TLS_KEYCTX_RXFLIT_CNT_S 24
+#define TLS_KEYCTX_RXFLIT_CNT_V(x) ((x) << TLS_KEYCTX_RXFLIT_CNT_S)
+
+#define TLS_KEYCTX_RXPROT_VER_S 20
+#define TLS_KEYCTX_RXPROT_VER_M 0xf
+#define TLS_KEYCTX_RXPROT_VER_V(x) ((x) << TLS_KEYCTX_RXPROT_VER_S)
+
+#define TLS_KEYCTX_RXCIPH_MODE_S 16
+#define TLS_KEYCTX_RXCIPH_MODE_M 0xf
+#define TLS_KEYCTX_RXCIPH_MODE_V(x) ((x) << TLS_KEYCTX_RXCIPH_MODE_S)
+
+#define TLS_KEYCTX_RXAUTH_MODE_S 12
+#define TLS_KEYCTX_RXAUTH_MODE_M 0xf
+#define TLS_KEYCTX_RXAUTH_MODE_V(x) ((x) << TLS_KEYCTX_RXAUTH_MODE_S)
+
+#define TLS_KEYCTX_RXCIAU_CTRL_S 11
+#define TLS_KEYCTX_RXCIAU_CTRL_V(x) ((x) << TLS_KEYCTX_RXCIAU_CTRL_S)
+
+#define TLS_KEYCTX_RX_SEQCTR_S 9
+#define TLS_KEYCTX_RX_SEQCTR_M 0x3
+#define TLS_KEYCTX_RX_SEQCTR_V(x) ((x) << TLS_KEYCTX_RX_SEQCTR_S)
+
+#define TLS_KEYCTX_RX_VALID_S 8
+#define TLS_KEYCTX_RX_VALID_V(x) ((x) << TLS_KEYCTX_RX_VALID_S)
+
+#define TLS_KEYCTX_RXCK_SIZE_S 3
+#define TLS_KEYCTX_RXCK_SIZE_M 0x7
+#define TLS_KEYCTX_RXCK_SIZE_V(x) ((x) << TLS_KEYCTX_RXCK_SIZE_S)
+
+#define TLS_KEYCTX_RXMK_SIZE_S 0
+#define TLS_KEYCTX_RXMK_SIZE_M 0x7
+#define TLS_KEYCTX_RXMK_SIZE_V(x) ((x) << TLS_KEYCTX_RXMK_SIZE_S)
+
 #define CHCR_HASH_MAX_DIGEST_SIZE 64
 #define CHCR_MAX_SHA_DIGEST_SIZE 64
 
@@ -176,6 +209,15 @@
  KEY_CONTEXT_SALT_PRESENT_V(1) | \
  KEY_CONTEXT_CTX_LEN_V((ctx_len)))
 
+#define  FILL_KEY_CRX_HDR(ck_size, mk_size, d_ck, opad, ctx_len) \
+   htonl(TLS_KEYCTX_RXMK_SIZE_V(mk_size) | \
+ TLS_KEYCTX_RXCK_SIZE_V(ck_size) | \
+ TLS_KEYCTX_RX_VALID_V(1) | \
+ TLS_KEYCTX_RX_SEQCTR_V(3) | \
+ TLS_KEYCTX_RXAUTH_MODE_V(4) | \
+ TLS_KEYCTX_RXCIPH_MODE_V(2) | \
+ TLS_KEYCTX_RXFLIT_CNT_V((ctx_len)))
+
 #define FILL_WR_OP_CCTX_SIZE \
htonl( \
FW_CRYPTO_LOOKASIDE_WR_OPCODE_V( \
diff --git a/drivers/crypto/chelsio/chcr_core.h 
b/drivers/crypto/chelsio/chcr_core.h
index 3c29ee0..77056a9 100644
--- a/drivers/crypto/chelsio/chcr_core.h
+++ b/drivers/crypto/chelsio/chcr_core.h
@@ -65,10 +65,58 @@
 struct _key_ctx {
__be32 ctx_hdr;
u8 salt[MAX_SALT];
-   __be64 reserverd;
+   __be64 iv_to_auth;
unsigned char key[0];
 };
 
+#define KEYCTX_TX_WR_IV_S  55
+#define KEYCTX_TX_WR_IV_M  0x1ffULL
+#define KEYCTX_TX_WR_IV_V(x) ((x) << KEYCTX_TX_WR_IV_S)
+#define KEYCTX_TX_WR_IV_G(x) \
+   (((x) >> KEYCTX_TX_WR_IV_S) & KEYCTX_TX_WR_IV_M)
+
+#define KEYCTX_TX_WR_AAD_S 47
+#define KEYCTX_TX_WR_AAD_M 0xffULL
+#define KEYCTX_TX_WR_AAD_V(x) ((x) << KEYCTX_TX_WR_AAD_S)
+#define KEYCTX_TX_WR_AAD_G(x) (((x) >> KEYCTX_TX_WR_AAD_S) & \
+   KEYCTX_TX_WR_AAD_M)
+
+#define KEYCTX_TX_WR_AADST_S 39
+#define KEYCTX_TX_WR_AADST_M 0xffULL
+#define KEYCTX_TX_WR_AADST_V(x) ((x) << KEYCTX_TX_WR_AADST_S)
+#define KEYCTX_TX_WR_AADST_G(x) \
+   (((x) >> KEYCTX_TX_WR_AADST_S) & KEYCTX_TX_WR_AADST_M)
+
+#define KEYCTX_TX_WR_CIPHER_S 30
+#define KEYCTX_TX_WR_CIPHER_M 0x1ffULL
+#define KEYCTX_TX_WR_CIPHER_V(x) ((x) << KEYCTX_TX_WR_CIPHER_S)
+#define KEYCTX_TX_WR_CIPHER_G(x) \
+   (((x) >> KEYCTX_TX_WR_CIPHER_S) & KEYCTX_TX_WR_CIPHER_M)
+
+#define KEYCTX_TX_WR_CIPHERST_S 23
+#define KEYCTX_TX_WR_CIPHERST_M 0x7f
+#define KEYCTX_TX_WR_CIPHERST_V(x) ((x) << KEYCTX_TX_WR_CIPHERST_S)
+#define KEYCTX_TX_WR_CIPHERST_G(x) \
+   (((x) >> KEYCTX_TX_WR_CIPHERST_S) & KEYCTX_TX_WR_CIPHERST_M)
+
+#define KEYCTX_TX_WR_AUTH_S 14
+#define KEYCTX_TX_WR_AUTH_M 0x1ff
+#define KEYCTX_TX_WR_AUTH_V(x) ((x) << KEYCTX_TX_WR_AUTH_S)
+#define KEYCTX_TX_WR_AUTH_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTH_S) & KEYCTX_TX_WR_AUTH_M)
+
+#define KEYCTX_TX_WR_AUTHST_S 7
+#define KEYCTX_TX_WR_AUTHST_M 0x7f
+#define KEYCTX_TX_WR_AUTHST_V(x) ((x) << KEYCTX_TX_WR_AUTHST_S)
+#define KEYCTX_TX_WR_AUTHST_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTHST_S) & KEYCTX_TX_WR_AUTHST_M)
+
+#define KEYCTX_TX_WR_AUTHIN_S 0
+#define KEYCTX_TX_WR_AUTHIN_M 0x7f
+#define KEYCTX_TX_WR_AUTHIN_V(x) ((x) << KEYCTX_TX_WR_AUTHIN_S)
+#define KEYCTX_TX_WR_AUTHIN_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTHIN_S) & KEYCTX_TX_WR_AUTHIN_M)
+
 struct chcr_wr {
struct fw_crypto_lookaside_wr wreq;
struct ulp_txpkt ulptx;
@@ -90,6 +138,11 

[crypto 3/8] chtls: ulp for Inline TLS processing

2017-12-05 Thread Atul Gupta
Register chtls as another tcp ULP, Based on a similar infrastructure
in tcp_cong. proto_ops are defined to handle CPL to send/receive
crypto request to hw.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_main.c | 585 ++
 1 file changed, 585 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_main.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_main.c 
b/drivers/crypto/chelsio/chtls/chtls_main.c
new file mode 100644
index 000..e951b4e
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_main.c
@@ -0,0 +1,585 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+#define DRV_NAME "chtls"
+
+/*
+ * chtls device management
+ * maintains a list of the chtls devices
+ */
+static LIST_HEAD(cdev_list);
+static DEFINE_MUTEX(cdev_mutex);
+static DEFINE_MUTEX(cdev_list_lock);
+
+static struct proto chtls_base_prot;
+static struct proto chtls_cpl_prot;
+static DEFINE_MUTEX(notify_mutex);
+static RAW_NOTIFIER_HEAD(listen_notify_list);
+struct request_sock_ops chtls_rsk_ops;
+static uint send_page_order = (14 - PAGE_SHIFT < 0) ? 0 : 14 - PAGE_SHIFT;
+
+int register_listen_notifier(struct notifier_block *nb)
+{
+   int err;
+
+   mutex_lock(_mutex);
+   err = raw_notifier_chain_register(_notify_list, nb);
+   mutex_unlock(_mutex);
+   return err;
+}
+
+int unregister_listen_notifier(struct notifier_block *nb)
+{
+   int err;
+
+   mutex_lock(_mutex);
+   err = raw_notifier_chain_unregister(_notify_list, nb);
+   mutex_unlock(_mutex);
+   return err;
+}
+
+int listen_notify_handler(struct notifier_block *this,
+ unsigned long event, void *data)
+{
+   struct sock *sk = data;
+   struct chtls_dev *cdev;
+   int ret =  NOTIFY_DONE;
+
+   switch (event) {
+   case CHTLS_LISTEN_START:
+   case CHTLS_LISTEN_STOP:
+   mutex_lock(_list_lock);
+   list_for_each_entry(cdev, _list, list) {
+   if (event == CHTLS_LISTEN_START)
+   ret = chtls_listen_start(cdev, sk);
+   else
+   chtls_listen_stop(cdev, sk);
+   }
+   mutex_unlock(_list_lock);
+   break;
+   }
+   return ret;
+}
+
+static struct notifier_block listen_notifier = {
+   .notifier_call = listen_notify_handler
+};
+
+static int listen_backlog_rcv(struct sock *sk, struct sk_buff *skb)
+{
+   if (likely(skb_transport_header(skb) != skb_network_header(skb)))
+   return tcp_v4_do_rcv(sk, skb);
+   BLOG_SKB_CB(skb)->backlog_rcv(sk, skb);
+   return 0;
+}
+
+static int chtls_start_listen(struct sock *sk)
+{
+   int err;
+
+   if (sk->sk_protocol != IPPROTO_TCP)
+   return -EPROTONOSUPPORT;
+
+   if (sk->sk_family == PF_INET &&
+   LOOPBACK(inet_sk(sk)->inet_rcv_saddr))
+   return -EADDRNOTAVAIL;
+
+   sk->sk_backlog_rcv = listen_backlog_rcv;
+   mutex_lock(_mutex);
+   err = raw_notifier_call_chain(_notify_list, 0, sk);
+   mutex_unlock(_mutex);
+   return err;
+}
+
+static int chtls_hash(struct sock *sk)
+{
+   int err;
+
+   err = tcp_prot.hash(sk);
+   if (sk->sk_state == TCP_LISTEN)
+   err |= chtls_start_listen(sk);
+
+   if (err)
+   tcp_prot.unhash(sk);
+   return err;
+}
+
+int chtls_stop_listen(struct sock *sk)
+{
+   if (sk->sk_protocol != IPPROTO_TCP)
+   return -EPROTONOSUPPORT;
+
+   mutex_lock(_mutex);
+   raw_notifier_call_chain(_notify_list, 1, sk);
+   mutex_unlock(_mutex);
+   return 0;
+}
+
+static void chtls_unhash(struct sock *sk)
+{
+   if (sk->sk_state == TCP_LISTEN)
+   chtls_stop_listen(sk);
+   tcp_prot.unhash(sk);
+}
+
+static void chtls_lsk_close(struct sock *sk, long timeout)
+{
+   struct tls_context *ctx = tls_get_ctx(sk);
+   void (*sk_proto_close)(struct sock *sk, long timeout);
+
+   lock_sock(sk);
+   sk_proto_close = ctx->sk_proto_close;
+   kfree(ctx);
+
+   release_sock(sk);
+   sk_proto_close(sk, timeout);
+}
+
+static void process_deferq(struct work_struct *task_param)
+{
+   struct sk_buff *skb;
+   struct chtls_dev *cdev = container_of(task_param,
+   struct chtls_dev, deferq_task);
+
+   spin_lock_bh(>deferq.lock);
+   while ((skb = __skb_dequeue(>deferq)) != NULL) {
+   spin_unlock_bh(>deferq.lock);
+

[crypto 1/8] cxgb4: Inline TLS

2017-12-05 Thread Atul Gupta
Add new uld driver for Inline TLS support. WR is defined to
submit crypto request to firmware.
Key area size is configured in hw-config file.

Signed-off-by: Atul Gupta 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  18 ++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|  32 ++--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h |   7 +
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |  98 +++-
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h| 121 ++-
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h   |   2 +
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h  | 165 -
 7 files changed, 425 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index cf47183..cfc9210 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -2826,8 +2826,8 @@ static int meminfo_show(struct seq_file *seq, void *v)
"Tx payload:", "Rx payload:", "LE hash:", "iSCSI region:",
"TDDP region:", "TPT region:", "STAG region:", "RQ region:",
"RQUDP region:", "PBL region:", "TXPBL region:",
-   "DBVFIFO region:", "ULPRX state:", "ULPTX state:",
-   "On-chip queues:"
+   "TLSKey region:", "DBVFIFO region:", "ULPRX state:",
+   "ULPTX state:", "On-chip queues:"
};
 
int i, n;
@@ -2943,6 +2943,12 @@ static int meminfo_show(struct seq_file *seq, void *v)
ulp_region(RX_RQUDP);
ulp_region(RX_PBL);
ulp_region(TX_PBL);
+   if (adap->params.crypto & FW_CAPS_CONFIG_TLS_INLINE) {
+   ulp_region(RX_TLS_KEY);
+   } else {
+   md->base = 0;
+   md->idx = ARRAY_SIZE(region);
+   }
 #undef ulp_region
md->base = 0;
md->idx = ARRAY_SIZE(region);
@@ -3098,6 +3104,14 @@ static int chcr_show(struct seq_file *seq, void *v)
   atomic_read(>chcr_stats.fallback));
seq_printf(seq, "IPSec PDU: %10u\n",
   atomic_read(>chcr_stats.ipsec_cnt));
+
+   seq_puts(seq, "\nChelsio Inline TLS Stats\n");
+   seq_printf(seq, "TLS PDU Tx: %u\n",
+  atomic_read(>chcr_stats.tls_pdu_tx));
+   seq_printf(seq, "TLS PDU Rx: %u\n",
+  atomic_read(>chcr_stats.tls_pdu_rx));
+   seq_printf(seq, "TLS Keys (DDR) Count: %u\n",
+  atomic_read(>chcr_stats.tls_key));
return 0;
 }
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 05a4abf..60eb18b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4086,18 +4086,32 @@ static int adap_init0(struct adapter *adap)
adap->num_ofld_uld += 2;
}
if (caps_cmd.cryptocaps) {
-   /* Should query params here...TODO */
-   params[0] = FW_PARAM_PFVF(NCRYPTO_LOOKASIDE);
-   ret = t4_query_params(adap, adap->mbox, adap->pf, 0, 2,
- params, val);
-   if (ret < 0) {
-   if (ret != -EINVAL)
+   if (ntohs(caps_cmd.cryptocaps) &
+   FW_CAPS_CONFIG_CRYPTO_LOOKASIDE) {
+   params[0] = FW_PARAM_PFVF(NCRYPTO_LOOKASIDE);
+   ret = t4_query_params(adap, adap->mbox, adap->pf, 0,
+ 2, params, val);
+   if (ret < 0) {
+   if (ret != -EINVAL)
+   goto bye;
+   } else {
+   adap->vres.ncrypto_fc = val[0];
+   }
+   adap->num_ofld_uld += 1;
+   }
+   if (ntohs(caps_cmd.cryptocaps) &
+   FW_CAPS_CONFIG_TLS_INLINE) {
+   params[0] = FW_PARAM_PFVF(TLS_START);
+   params[1] = FW_PARAM_PFVF(TLS_END);
+   ret = t4_query_params(adap, adap->mbox, adap->pf, 0,
+ 2, params, val);
+   if (ret < 0)
goto bye;
-   } else {
-   adap->vres.ncrypto_fc = val[0];
+   adap->vres.key.start = val[0];
+   adap->vres.key.size = val[1] - val[0] + 1;
+   adap->num_uld += 1;
}
adap->params.crypto = ntohs(caps_cmd.cryptocaps);
-   adap->num_uld += 1;
}
 #undef FW_PARAM_PFVF
 #undef FW_PARAM_DEV
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index 1d37672..55863f6 100644
--- 

[crypto 0/8] Chelsio inline TLS

2017-12-05 Thread Atul Gupta
RFC series for Chelsio Inline TLS driver (chtls.ko)

Chtls driver use the available ULP infrastructure to register
chtls as another ULP. Chtls use the TCP Sockets to transmit and
receive TLS record. TCP proto_ops is extended to offload TLS
record.

T6 adapter provide the following features:
-TLS record offload, add TLS header, encrypt data and transmit
-TLS record receive and decrypt
-TLS keys store
-GCM crypto engine

Atul Gupta (8):
  cxgb4: Inline TLS
  chcr: changes to chcr driver
  chtls: ulp for Inline TLS processing
  chtls: CPL handler definition
  chtls: Inline crypto request for Tx.
  chtls: TCB and Key program
  chtls: structure and macro definiton
  Kconfig Makefile

 drivers/crypto/chelsio/Kconfig |   10 +
 drivers/crypto/chelsio/Makefile|1 +
 drivers/crypto/chelsio/chcr_algo.h |   42 +
 drivers/crypto/chelsio/chcr_core.h |   55 +-
 drivers/crypto/chelsio/chtls/Makefile  |4 +
 drivers/crypto/chelsio/chtls/chtls.h   |  481 +
 drivers/crypto/chelsio/chtls/chtls_cm.c| 2048 
 drivers/crypto/chelsio/chtls/chtls_cm.h|  209 ++
 drivers/crypto/chelsio/chtls/chtls_hw.c|  394 
 drivers/crypto/chelsio/chtls/chtls_io.c| 1866 ++
 drivers/crypto/chelsio/chtls/chtls_main.c  |  585 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |   18 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|   32 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h |7 +
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |   98 +-
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h|  121 +-
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h   |2 +
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h  |  165 +-
 include/uapi/linux/tls.h   |1 +
 19 files changed, 6120 insertions(+), 19 deletions(-)
 create mode 100644 drivers/crypto/chelsio/chtls/Makefile
 create mode 100644 drivers/crypto/chelsio/chtls/chtls.h
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_cm.c
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_cm.h
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_hw.c
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_io.c
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_main.c

-- 
1.8.3.1



[crypto] chcr: fix a type cast error

2017-12-05 Thread Atul Gupta
fix a type cast error for queue descriptor

Reported-by: Dan Carpenter 
Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chcr_ipsec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/chelsio/chcr_ipsec.c 
b/drivers/crypto/chelsio/chcr_ipsec.c
index f90f991..a413156 100644
--- a/drivers/crypto/chelsio/chcr_ipsec.c
+++ b/drivers/crypto/chelsio/chcr_ipsec.c
@@ -428,7 +428,7 @@ inline void *copy_key_cpltx_pktxt(struct sk_buff *skb,
memcpy(pos, sa_entry->key, left);
memcpy(q->q.desc, sa_entry->key + left,
   key_len - left);
-   pos = q->q.desc + (key_len - left);
+   pos = (u8 *)q->q.desc + (key_len - left);
}
}
/* Copy CPL TX PKT XT */
-- 
1.8.3.1



[PATCH] crypto: chcr: select CRYPTO_GF128MUL

2017-12-05 Thread Arnd Bergmann
Without the gf128mul library support, we can run into a link
error:

drivers/crypto/chelsio/chcr_algo.o: In function `chcr_update_tweak':
chcr_algo.c:(.text+0x7e0): undefined reference to `gf128mul_x8_ble'

This adds a Kconfig select statement for it, next to the ones we
already have.

Fixes: b8fd1f4170e7 ("crypto: chcr - Add ctr mode and process large sg entries 
for cipher")
Signed-off-by: Arnd Bergmann 
---
 drivers/crypto/chelsio/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/crypto/chelsio/Kconfig b/drivers/crypto/chelsio/Kconfig
index 3e104f5aa0c2..b56b3f711d94 100644
--- a/drivers/crypto/chelsio/Kconfig
+++ b/drivers/crypto/chelsio/Kconfig
@@ -5,6 +5,7 @@ config CRYPTO_DEV_CHELSIO
select CRYPTO_SHA256
select CRYPTO_SHA512
select CRYPTO_AUTHENC
+   select CRYPTO_GF128MUL
---help---
  The Chelsio Crypto Co-processor driver for T6 adapters.
 
-- 
2.9.0



Re: [PATCH v3 1/3] dt-bindings: hwrng: Add Samsung Exynos 5250+ True RNG bindings

2017-12-05 Thread Krzysztof Kozlowski
On Tue, Dec 5, 2017 at 10:30 AM, Łukasz Stelmach  wrote:
> It was <2017-12-04 pon 14:13>, when Krzysztof Kozlowski wrote:
>> On Mon, Dec 4, 2017 at 1:53 PM, Łukasz Stelmach  
>> wrote:
>>> Add binding documentation for the True Random Number Generator
>>> found on Samsung Exynos 5250+ SoCs.
>>>
>>> Signed-off-by: Łukasz Stelmach 
>>> ---
>>>  .../devicetree/bindings/rng/samsung,exynos5250-trng.txt | 17 
>>> +
>>>  1 file changed, 17 insertions(+)
>>>  create mode 100644 
>>> Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
>>>
>>> diff --git
>>> a/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
>>> b/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
>>> new file mode 100644
>>> index ..5a613a4ec780
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
>>> @@ -0,0 +1,17 @@
>>> +Exynos True Random Number Generator
>>> +
>>> +Required properties:
>>> +
>>> +- compatible  : Should be "samsung,exynos5250-trng".
>>> +- reg : Specifies base physical address and size of the registers 
>>> map.
>>> +- clocks  : Phandle to clock-controller plus clock-specifier pair.
>>> +- clock-names : "secss" as a clock name.
>>> +
>>> +Example:
>>> +
>>> +   rng@10830600 {
>>> +   compatible = "samsung,exynos5250-trng";
>>> +   reg = <0x10830600 0x100>;
>>> +   clocks = < CLK_SSS>;
>>> +   clock-names = "secss";
>>> +   };
>>> --
>>> 2.11.0
>>
>> Mine and Rob's tags disappeared and I think you did not introduce any
>> major changes here, right?
>
> A very experienced kernel developer adviced me to remove them.

In that case:
Reviewed-by: Krzysztof Kozlowski 

BR,
Krzysztof