Re: [PATCH V4 2/2] ARM: dts: imx7s: add snvs rtc clock

2018-02-01 Thread Shawn Guo
On Tue, Jan 09, 2018 at 05:52:06PM +0800, Anson Huang wrote:
> Add i.MX7 SNVS RTC clock.
> 
> Signed-off-by: Anson Huang 

Looks fine to me.  Ping me when clk driver part lands mainline.

Shawn


Re: [PATCH 1/3] compiler-gcc.h: Introduce __optimize function attribute

2018-02-01 Thread Ard Biesheuvel
On 1 February 2018 at 10:21, Geert Uytterhoeven  wrote:
> Create a new function attribute __optimize, which allows to specify an
> optimization level on a per-function basis.
>
> Signed-off-by: Geert Uytterhoeven 

Acked-by: Ard Biesheuvel 

> ---
> I assume this is supported as of gcc-4.4:
>   - gcc version 4.3.3 (GCC): warning: ‘__optimize__’ attribute directive
> ignored
>   - gcc version 4.4.7 (Ubuntu/Linaro 4.4.7-1ubuntu2): OK
> ---
>  include/linux/compiler-gcc.h | 4 
>  include/linux/compiler.h | 4 
>  2 files changed, 8 insertions(+)
>
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index 631354acfa720475..0a278a527944ad2f 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -196,6 +196,10 @@
>  #endif /* __CHECKER__ */
>  #endif /* GCC_VERSION >= 40300 */
>
> +#if GCC_VERSION >= 40400
> +#define __optimize(level)  __attribute__((__optimize__(level)))
> +#endif /* GCC_VERSION >= 40400 */
> +
>  #if GCC_VERSION >= 40500
>
>  #ifndef __CHECKER__
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 52e611ab9a6cf6fd..5ff818e9a836e898 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -271,6 +271,10 @@ static __always_inline void __write_once_size(volatile 
> void *p, void *res, int s
>
>  #endif /* __ASSEMBLY__ */
>
> +#ifndef __optimize
> +# define __optimize(level)
> +#endif
> +
>  /* Compile time object size, -1 for unknown */
>  #ifndef __compiletime_object_size
>  # define __compiletime_object_size(obj) -1
> --
> 2.7.4
>


Re: [PATCH 3/3] crypto: sha3-generic - Use __optimize to support old compilers

2018-02-01 Thread Ard Biesheuvel
On 1 February 2018 at 10:22, Geert Uytterhoeven  wrote:
> With gcc-4.1.2:
>
> crypto/sha3_generic.c:39: warning: ‘__optimize__’ attribute directive 
> ignored
>
> Use the newly introduced __optimize macro to fix this.
>
> Fixes: 83dee2ce1ae791c3 ("crypto: sha3-generic - rewrite KECCAK transform to 
> help the compiler optimize")
> Signed-off-by: Geert Uytterhoeven 

Acked-by: Ard Biesheuvel 

> ---
>  crypto/sha3_generic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/crypto/sha3_generic.c b/crypto/sha3_generic.c
> index a965b9d8055983af..c409cd87fea5decd 100644
> --- a/crypto/sha3_generic.c
> +++ b/crypto/sha3_generic.c
> @@ -35,7 +35,7 @@ static const u64 keccakf_rndc[24] = {
>
>  /* update the state with given number of rounds */
>
> -static void __attribute__((__optimize__("O3"))) keccakf(u64 st[25])
> +static void __optimize("O3") keccakf(u64 st[25])
>  {
> u64 t[5], tt, bc[5];
> int round;
> --
> 2.7.4
>


Re: [PATCH 2/3] compiler-gcc.h: __nostackprotector needs gcc-4.4 and up

2018-02-01 Thread Arnd Bergmann
On Thu, Feb 1, 2018 at 12:55 PM, Geert Uytterhoeven
 wrote:
> Hi Arnd,
>
> On Thu, Feb 1, 2018 at 12:46 PM, Arnd Bergmann  wrote:
>> On Thu, Feb 1, 2018 at 11:21 AM, Geert Uytterhoeven
>>  wrote:
>>> Gcc versions before 4.4 do not recognize the __optimize__ compiler
>>> attribute:

>
> And we might want to remove the dummy in include/linux/compiler_types.h:
>
> #ifndef __nostackprotector
> # define __nostackprotector
> #endif
>
> BTW, how does this work with non-gcc compilers?

clang should in theory accept the same flags, and it pretends to be an older
version of gcc.

I don't think anyone cares about ICC or any other compiler.

Arnd


Re: [PATCH v3 4/5] clk: imx7d: add CAAM clock

2018-02-01 Thread Fabio Estevam
On Wed, Jan 31, 2018 at 12:00 AM, Bryan O'Donoghue
 wrote:
> From: Rui Miguel Silva 
>
> Add CAAM clock so that we could use the Cryptographic Acceleration and
> Assurance Module (CAAM) hardware block.
>
> Signed-off-by: Rui Miguel Silva 
> Cc: Michael Turquette 
> Cc: Stephen Boyd 
> Cc: linux-...@vger.kernel.org
> Cc: "Horia Geantă" 
> Cc: Aymen Sghaier 
> Cc: Fabio Estevam 
> Cc: Peng Fan 
> Cc: "David S. Miller" 
> Cc: Lukas Auer 
> Signed-off-by: Bryan O'Donoghue 

Reviewed-by: Fabio Estevam 


Re: [PATCH v3 2/5] crypto: caam: Fix endless loop when RNG is already initialized

2018-02-01 Thread Horia Geantă
On 1/31/2018 4:00 AM, Bryan O'Donoghue wrote:
> commit 1005bccd7a4a ("crypto: caam - enable instantiation of all RNG4 state
> handles") introduces a control when incrementing ent_delay which contains
> the following comment above it:
> 
> /*
>  * If either SH were instantiated by somebody else
>  * (e.g. u-boot) then it is assumed that the entropy
>  * parameters are properly set and thus the function
>  * setting these (kick_trng(...)) is skipped.
>  * Also, if a handle was instantiated, do not change
>  * the TRNG parameters.
>  */
> 
> This is a problem observed when sec_init() has been run in u-boot and
> and TrustZone is enabled. We can fix this by instantiating all rng state
> handles in u-boot but, on the Kernel side we should ensure that this
> non-terminating path is dealt with.
> 
> Fixes: 1005bccd7a4a ("crypto: caam - enable instantiation of all RNG4 state
> handles")
> 
> Reported-by: Ryan Harkin 
> Cc: "Horia Geantă" 
> Cc: Aymen Sghaier 
> Cc: Fabio Estevam 
> Cc: Peng Fan 
> Cc: "David S. Miller" 
> Cc: Lukas Auer 
> Cc:  # 4.12+
> Signed-off-by: Bryan O'Donoghue 
> ---
>  drivers/crypto/caam/ctrl.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c
> index 98986d3..0a1e96b 100644
> --- a/drivers/crypto/caam/ctrl.c
> +++ b/drivers/crypto/caam/ctrl.c
> @@ -704,7 +704,10 @@ static int caam_probe(struct platform_device *pdev)
>ent_delay);
>   kick_trng(pdev, ent_delay);
>   ent_delay += 400;
> + } else if (ctrlpriv->rng4_sh_init && inst_handles) {
> + ent_delay += 400;
>   }
If both RNG state handles are initialized before kernel runs, then
instantiate_rng() should be a no-op and return 0, which is enough to exit the
loop: while ((ret == -EAGAIN) && (ent_delay < RTSDCTL_ENT_DLY_MAX))

If the loop cannot exit based on value of "ret" != -EAGAIN, then it means
caam_probe() will eventually fail due to ret == -EAGAIN:
if (ret) {
dev_err(dev, "failed to instantiate RNG");
goto caam_remove;
}

Please provide more details, so that the root cause is found and fixed.
For e.g. what is the value of RDSTA (RNG DRNG Status register @0x6C0):
-before & after u-boot initializes RNG
-as seen by kernel during caam_probe()
Also provide related error messages displayed during boot.

Thanks,
Horia


Re: [PATCH 2/3] compiler-gcc.h: __nostackprotector needs gcc-4.4 and up

2018-02-01 Thread Geert Uytterhoeven
Hi Arnd,

On Thu, Feb 1, 2018 at 12:46 PM, Arnd Bergmann  wrote:
> On Thu, Feb 1, 2018 at 11:21 AM, Geert Uytterhoeven
>  wrote:
>> Gcc versions before 4.4 do not recognize the __optimize__ compiler
>> attribute:
>>
>> warning: ‘__optimize__’ attribute directive ignored
>>
>> Fixes: 7375ae3a0b79ea07 ("compiler-gcc.h: Introduce __nostackprotector 
>> function attribute")
>> Signed-off-by: Geert Uytterhoeven 
>> ---
>> Can anyone please verify this?
>> Apparently __nostackprotector is used on x86 only, which is usually
>> served by modern compilers.
>
> I've checked that __optimize("no-stack-protector") is accepted by exactly 
> those
> compilers that your 40400 version check tests for, across all architectures, 
> so
> that's fine.

Thanks!

> However,  looking at commit 91cfc88c66bf ("x86: Use __nostackprotect for
> sme_encrypt_kernel"), I suspect that gcc-4.1 through 4.3 will now cause
> a runtime failure in sme_encrypt_kernel() without a compile-time warning.

So having this functionality is a hard requirement. Oops...

> I would leave __nostackprotector unchanged here, so we at least get
> a warning for functions that need to disable the stack protector to work
> correctly.

Agreed.

> We might want to add an #ifdef CONFIG_CC_STACKPROTECTOR around
> the __nostackprotector definition, so that we only warn if stackprotector
> is globally enabled.

And we might want to remove the dummy in include/linux/compiler_types.h:

#ifndef __nostackprotector
# define __nostackprotector
#endif

BTW, how does this work with non-gcc compilers?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH 2/3] compiler-gcc.h: __nostackprotector needs gcc-4.4 and up

2018-02-01 Thread Arnd Bergmann
On Thu, Feb 1, 2018 at 11:21 AM, Geert Uytterhoeven
 wrote:
> Gcc versions before 4.4 do not recognize the __optimize__ compiler
> attribute:
>
> warning: ‘__optimize__’ attribute directive ignored
>
> Fixes: 7375ae3a0b79ea07 ("compiler-gcc.h: Introduce __nostackprotector 
> function attribute")
> Signed-off-by: Geert Uytterhoeven 
> ---
> Can anyone please verify this?
> Apparently __nostackprotector is used on x86 only, which is usually
> served by modern compilers.

I've checked that __optimize("no-stack-protector") is accepted by exactly those
compilers that your 40400 version check tests for, across all architectures, so
that's fine.

However,  looking at commit 91cfc88c66bf ("x86: Use __nostackprotect for
sme_encrypt_kernel"), I suspect that gcc-4.1 through 4.3 will now cause
a runtime failure in sme_encrypt_kernel() without a compile-time warning.

I would leave __nostackprotector unchanged here, so we at least get
a warning for functions that need to disable the stack protector to work
correctly.

We might want to add an #ifdef CONFIG_CC_STACKPROTECTOR around
the __nostackprotector definition, so that we only warn if stackprotector
is globally enabled.

   Arnd


Re: [PATCH v3 5/5] ARM: dts: imx7s: add CAAM device node

2018-02-01 Thread Horia Geantă
On 1/31/2018 4:00 AM, Bryan O'Donoghue wrote:
> From: Rui Miguel Silva 
> 
> Add CAAM device node to the i.MX7s device tree.
> 
> Signed-off-by: Rui Miguel Silva 
> Cc: Shawn Guo 
> Cc: Sascha Hauer 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: "Horia Geantă" 
> Cc: Aymen Sghaier 
> Cc: Fabio Estevam 
> Cc: Peng Fan 
> Cc: "David S. Miller" 
> Cc: Lukas Auer 
> Signed-off-by: Bryan O'Donoghue 
> ---
>  arch/arm/boot/dts/imx7s.dtsi | 31 +++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx7s.dtsi b/arch/arm/boot/dts/imx7s.dtsi
> index 82ad26e..e38c159 100644
> --- a/arch/arm/boot/dts/imx7s.dtsi
> +++ b/arch/arm/boot/dts/imx7s.dtsi
> @@ -805,6 +805,37 @@
>   status = "disabled";
>   };
>  
> + crypto: caam@3090 {
> + compatible = "fsl,sec-v4.0";
> + fsl,sec-era = <4>;
CCBVID[CAAM_ERA] = 8.
Either remove this (optional) property and let the bootloader add it
dynamically, or provide a correct value for it.

Thanks,
Horia



Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-02-01 Thread Harsh Jain


On 01-02-2018 15:55, Jonathan Cameron wrote:
> On Thu, 1 Feb 2018 12:07:21 +0200
> Gilad Ben-Yossef  wrote:
>
>> On Thu, Feb 1, 2018 at 12:04 PM, Stephan Mueller  wrote:
>>> Am Donnerstag, 1. Februar 2018, 10:35:07 CET schrieb Gilad Ben-Yossef:
>>>
>>> Hi Gilad,
>>>  
> Which works well for the sort of optimization I did and for hardware that
> can do iv dependency tracking itself. If hardware dependency tracking was
> avilable, you would be able to queue up requests with a chained IV
> without taking any special precautions at all. The hardware would
> handle the IV update dependencies.
>
> So in conclusion, Stephan's approach should work and give us a nice
> small patch suitable for stable inclusion.
>
> However, if people know that their setup overhead can be put in parallel
> with previous requests (even when the IV is not yet updated) then they
> will
> probably want to do something inside their own driver and set the flag
> that Stephan is proposing adding to bypass the mutex approach.  
 The patches from Stephan looks good to me, but I think we can do better
 for the long term approach you are discussing.  
>>> What you made me think of is the following: shouldn't we relay the inline IV
>>> flag on to the crypto drivers?
>>>
>>> The reason is to allow a clean implementation of the enabling or disabling 
>>> of
>>> the dependency handling in the driver. Jonathan's driver, for example, 
>>> decides
>>> based on the pointer value of the IV buffer whether it is the same buffer 
>>> and
>>> thus dependency handling is to be applied. This is fragile.
> I agree it's inelegant and a flag would be better than pointer tricks (though
> they are safe currently - we never know what might change in future)
> It was really a minimal example rather than a suggestion of the ultimate
> solution ;)  I was planning on suggesting a flag myself once the basic
> discussion concluded the approach was worthwhile.
>
>>> As AF_ALG knows whether the inline IV with separate IVs per request or the
>>> serialization with one IV buffer for all requests is requested, it should
>>> relay this state on to the drivers. Thus, for example, Jonathan's driver can
>>> be changed to rely on this flag instead on the buffer pointer value to 
>>> decide
>>> whether to enable its dependency handling.  
>> Yes, that is exactly what I was trying to point out :-)
> Agreed - make things explicit rather than basically relying on knowing how
> the above layers are working.
IPSec layer may not get benefit from this because they send complete sg list in 
single request only. They don't need partial mode support.
>
> Thanks,
>
> Jonathan
>



Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-02-01 Thread Jonathan Cameron
On Thu, 1 Feb 2018 12:07:21 +0200
Gilad Ben-Yossef  wrote:

> On Thu, Feb 1, 2018 at 12:04 PM, Stephan Mueller  wrote:
> > Am Donnerstag, 1. Februar 2018, 10:35:07 CET schrieb Gilad Ben-Yossef:
> >
> > Hi Gilad,
> >  
> >> >
> >> > Which works well for the sort of optimization I did and for hardware that
> >> > can do iv dependency tracking itself. If hardware dependency tracking was
> >> > avilable, you would be able to queue up requests with a chained IV
> >> > without taking any special precautions at all. The hardware would
> >> > handle the IV update dependencies.
> >> >
> >> > So in conclusion, Stephan's approach should work and give us a nice
> >> > small patch suitable for stable inclusion.
> >> >
> >> > However, if people know that their setup overhead can be put in parallel
> >> > with previous requests (even when the IV is not yet updated) then they
> >> > will
> >> > probably want to do something inside their own driver and set the flag
> >> > that Stephan is proposing adding to bypass the mutex approach.  
> >>
> >> The patches from Stephan looks good to me, but I think we can do better
> >> for the long term approach you are discussing.  
> >
> > What you made me think of is the following: shouldn't we relay the inline IV
> > flag on to the crypto drivers?
> >
> > The reason is to allow a clean implementation of the enabling or disabling 
> > of
> > the dependency handling in the driver. Jonathan's driver, for example, 
> > decides
> > based on the pointer value of the IV buffer whether it is the same buffer 
> > and
> > thus dependency handling is to be applied. This is fragile.

I agree it's inelegant and a flag would be better than pointer tricks (though
they are safe currently - we never know what might change in future)
It was really a minimal example rather than a suggestion of the ultimate
solution ;)  I was planning on suggesting a flag myself once the basic
discussion concluded the approach was worthwhile.

> >
> > As AF_ALG knows whether the inline IV with separate IVs per request or the
> > serialization with one IV buffer for all requests is requested, it should
> > relay this state on to the drivers. Thus, for example, Jonathan's driver can
> > be changed to rely on this flag instead on the buffer pointer value to 
> > decide
> > whether to enable its dependency handling.  
> 
> Yes, that is exactly what I was trying to point out :-)

Agreed - make things explicit rather than basically relying on knowing how
the above layers are working.

Thanks,

Jonathan



[PATCH 1/3] compiler-gcc.h: Introduce __optimize function attribute

2018-02-01 Thread Geert Uytterhoeven
Create a new function attribute __optimize, which allows to specify an
optimization level on a per-function basis.

Signed-off-by: Geert Uytterhoeven 
---
I assume this is supported as of gcc-4.4:
  - gcc version 4.3.3 (GCC): warning: ‘__optimize__’ attribute directive
ignored
  - gcc version 4.4.7 (Ubuntu/Linaro 4.4.7-1ubuntu2): OK
---
 include/linux/compiler-gcc.h | 4 
 include/linux/compiler.h | 4 
 2 files changed, 8 insertions(+)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 631354acfa720475..0a278a527944ad2f 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -196,6 +196,10 @@
 #endif /* __CHECKER__ */
 #endif /* GCC_VERSION >= 40300 */
 
+#if GCC_VERSION >= 40400
+#define __optimize(level)  __attribute__((__optimize__(level)))
+#endif /* GCC_VERSION >= 40400 */
+
 #if GCC_VERSION >= 40500
 
 #ifndef __CHECKER__
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 52e611ab9a6cf6fd..5ff818e9a836e898 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -271,6 +271,10 @@ static __always_inline void __write_once_size(volatile 
void *p, void *res, int s
 
 #endif /* __ASSEMBLY__ */
 
+#ifndef __optimize
+# define __optimize(level)
+#endif
+
 /* Compile time object size, -1 for unknown */
 #ifndef __compiletime_object_size
 # define __compiletime_object_size(obj) -1
-- 
2.7.4



[PATCH 2/3] compiler-gcc.h: __nostackprotector needs gcc-4.4 and up

2018-02-01 Thread Geert Uytterhoeven
Gcc versions before 4.4 do not recognize the __optimize__ compiler
attribute:

warning: ‘__optimize__’ attribute directive ignored

Fixes: 7375ae3a0b79ea07 ("compiler-gcc.h: Introduce __nostackprotector function 
attribute")
Signed-off-by: Geert Uytterhoeven 
---
Can anyone please verify this?
Apparently __nostackprotector is used on x86 only, which is usually
served by modern compilers.
---
 include/linux/compiler-gcc.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 0a278a527944ad2f..73bc63e0a1c4b664 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -167,8 +167,6 @@
 
 #if GCC_VERSION >= 40100
 # define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
-
-#define __nostackprotector 
__attribute__((__optimize__("no-stack-protector")))
 #endif
 
 #if GCC_VERSION >= 40300
@@ -198,6 +196,7 @@
 
 #if GCC_VERSION >= 40400
 #define __optimize(level)  __attribute__((__optimize__(level)))
+#define __nostackprotector __optimize("no-stack-protector")
 #endif /* GCC_VERSION >= 40400 */
 
 #if GCC_VERSION >= 40500
-- 
2.7.4



[PATCH 3/3] crypto: sha3-generic - Use __optimize to support old compilers

2018-02-01 Thread Geert Uytterhoeven
With gcc-4.1.2:

crypto/sha3_generic.c:39: warning: ‘__optimize__’ attribute directive 
ignored

Use the newly introduced __optimize macro to fix this.

Fixes: 83dee2ce1ae791c3 ("crypto: sha3-generic - rewrite KECCAK transform to 
help the compiler optimize")
Signed-off-by: Geert Uytterhoeven 
---
 crypto/sha3_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/sha3_generic.c b/crypto/sha3_generic.c
index a965b9d8055983af..c409cd87fea5decd 100644
--- a/crypto/sha3_generic.c
+++ b/crypto/sha3_generic.c
@@ -35,7 +35,7 @@ static const u64 keccakf_rndc[24] = {
 
 /* update the state with given number of rounds */
 
-static void __attribute__((__optimize__("O3"))) keccakf(u64 st[25])
+static void __optimize("O3") keccakf(u64 st[25])
 {
u64 t[5], tt, bc[5];
int round;
-- 
2.7.4



Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-02-01 Thread Stephan Mueller
Am Donnerstag, 1. Februar 2018, 11:06:30 CET schrieb Gilad Ben-Yossef:

Hi Gilad,

> 2. Pointing out that the problem solved (and rightfully so) by mutex in
> AF_ALG AIO implementation must exists elsewhere as well - for example
> IPsec, and is probably solved there too, so if we add the flag as
> suggested, it can be used there as well to gain similar benefits to what
> Jonathan is suggesting.

You are quite right. this patch could speed up things for IPSec and the disk 
encryption logic too. So I think we should CC also the IPSec folks and the 
disk crypto folks to review the patch.

Ciao
Stephan




Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-02-01 Thread Gilad Ben-Yossef
On Thu, Feb 1, 2018 at 12:04 PM, Stephan Mueller  wrote:
> Am Donnerstag, 1. Februar 2018, 10:35:07 CET schrieb Gilad Ben-Yossef:
>
> Hi Gilad,
>
>> >
>> > Which works well for the sort of optimization I did and for hardware that
>> > can do iv dependency tracking itself. If hardware dependency tracking was
>> > avilable, you would be able to queue up requests with a chained IV
>> > without taking any special precautions at all. The hardware would
>> > handle the IV update dependencies.
>> >
>> > So in conclusion, Stephan's approach should work and give us a nice
>> > small patch suitable for stable inclusion.
>> >
>> > However, if people know that their setup overhead can be put in parallel
>> > with previous requests (even when the IV is not yet updated) then they
>> > will
>> > probably want to do something inside their own driver and set the flag
>> > that Stephan is proposing adding to bypass the mutex approach.
>>
>> The patches from Stephan looks good to me, but I think we can do better
>> for the long term approach you are discussing.
>
> What you made me think of is the following: shouldn't we relay the inline IV
> flag on to the crypto drivers?
>
> The reason is to allow a clean implementation of the enabling or disabling of
> the dependency handling in the driver. Jonathan's driver, for example, decides
> based on the pointer value of the IV buffer whether it is the same buffer and
> thus dependency handling is to be applied. This is fragile.
>
> As AF_ALG knows whether the inline IV with separate IVs per request or the
> serialization with one IV buffer for all requests is requested, it should
> relay this state on to the drivers. Thus, for example, Jonathan's driver can
> be changed to rely on this flag instead on the buffer pointer value to decide
> whether to enable its dependency handling.

Yes, that is exactly what I was trying to point out :-)

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru


Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-02-01 Thread Gilad Ben-Yossef
On Thu, Feb 1, 2018 at 11:46 AM, Stephan Mueller  wrote:
> Am Donnerstag, 1. Februar 2018, 10:35:07 CET schrieb Gilad Ben-Yossef:
>
> Hi Gilad,
>
>> Hi,
>>

>> >
>> > Quoting a private email from Stephan (at Stephan's suggestion)
>> >
>> >> What I however could fathom is that we introduce a flag where a driver
>> >> implements its own queuing mechanism. In this case, the mutex handling
>> >> would be disregarded.
>> >
>> > Which works well for the sort of optimization I did and for hardware that
>> > can do iv dependency tracking itself. If hardware dependency tracking was
>> > avilable, you would be able to queue up requests with a chained IV
>> > without taking any special precautions at all. The hardware would
>> > handle the IV update dependencies.
>> >
>> > So in conclusion, Stephan's approach should work and give us a nice
>> > small patch suitable for stable inclusion.
>> >
>> > However, if people know that their setup overhead can be put in parallel
>> > with previous requests (even when the IV is not yet updated) then they
>> > will
>> > probably want to do something inside their own driver and set the flag
>> > that Stephan is proposing adding to bypass the mutex approach.
>>
>> The patches from Stephan looks good to me, but I think we can do better
>> for the long term approach you are discussing.
>>
>> Consider that the issue we are dealing with is in no way unique to the
>> AF_ALG use case, it just happens to expose it.
>>
>> The grander issue is, I believe, that we are missing an API for chained
>> crypto operations. One that allows submitting multiple concurrent but
>> dependent requests
>> to tfm providers.
>>
>> Trying to second guess whether or not there is a dependency between calls
>> from the address of the IV is not a clean solution. Why don't we make it
>> explicit?
>>
>> For example, a flag that can be set on a tfm that states that the subsequent
>> series of requests are interdependent. If you need a different stream,
>> simply allocated anotehr
>> tfm object.
>
> But this is already implemented, is it not considering the discussed patches?
>
> Again, here are the use cases I see for AIO:
>
> - multiple IOCBs which are totally separate and have no interdependencies: the
> inline IV approach
>
> - multiple IOCBs which are interdependent: the current API with the patches
> that we are discussing here (note, a new patch set is coming after the merge
> window)
>
> - if you have several independent invocations of multiple interdependent
> IOCBs: call accept() again on the TFM-FD to get another operations FD. This is
> followed either option one or two above. Note, this is the idea behind the
> kcapi_handle_reinit() call just added to libkcapi. So, you have multiple
> operation-FDs on which you do either interdependent operation or inline IV
> handling.
>>
>> This will let each driver do it's best, be it a simple mutex, software
>> queuing or hardware
>> dependency tracking as the case m may be.
>
> Again, I am wondering that with the discussed patch set we have this
> functionality already.
>>
>> Than of course, AF_ALG code (or any other user for that matter) will
>> not need to handle
>> interdependence, just set the right flag.
>
> The issue is that some drivers simply do not handle this interdependency at
> all -- see the bug report from Harsh.
>
> Thus, the current patch set (again which will be updated after the merge
> window) contains:
>
> 1. adding a mutex to AF_ALG to ensure serialization of interdependent calls
> (option 2 from above) irrespective whether the driver implements support for
> it or not.
>
> 2. add the inline IV handling (to serve option 1)
>
> 3. add a flag that can be set by the crypto implementation. If this flag is
> set, then the mutex of AF_ALG is *not* set assuming that the crypto driver
> handles the serialization.
>
> Note, option 3 from above is implemented in the proper usage of the AF_ALG
> interface.

Sorry for not communicating clearly enough. I was not objecting at
all. Let me try again.

What I was trying to say is, more succinctly hopefully this time:

1. Advocating for that 3rd option. I did not understand the plan
include adding it,

2. Pointing out that the problem solved (and rightfully so) by mutex in AF_ALG
AIO implementation must exists elsewhere as well - for example IPsec, and is
probably solved there too, so if we add the flag as suggested, it can be used
there as well to gain similar benefits to what Jonathan is suggesting.

Thanks,
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru


Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-02-01 Thread Stephan Mueller
Am Donnerstag, 1. Februar 2018, 10:35:07 CET schrieb Gilad Ben-Yossef:

Hi Gilad,

> > 
> > Which works well for the sort of optimization I did and for hardware that
> > can do iv dependency tracking itself. If hardware dependency tracking was
> > avilable, you would be able to queue up requests with a chained IV
> > without taking any special precautions at all. The hardware would
> > handle the IV update dependencies.
> > 
> > So in conclusion, Stephan's approach should work and give us a nice
> > small patch suitable for stable inclusion.
> > 
> > However, if people know that their setup overhead can be put in parallel
> > with previous requests (even when the IV is not yet updated) then they
> > will
> > probably want to do something inside their own driver and set the flag
> > that Stephan is proposing adding to bypass the mutex approach.
> 
> The patches from Stephan looks good to me, but I think we can do better
> for the long term approach you are discussing.

What you made me think of is the following: shouldn't we relay the inline IV 
flag on to the crypto drivers?

The reason is to allow a clean implementation of the enabling or disabling of 
the dependency handling in the driver. Jonathan's driver, for example, decides 
based on the pointer value of the IV buffer whether it is the same buffer and 
thus dependency handling is to be applied. This is fragile.

As AF_ALG knows whether the inline IV with separate IVs per request or the 
serialization with one IV buffer for all requests is requested, it should 
relay this state on to the drivers. Thus, for example, Jonathan's driver can 
be changed to rely on this flag instead on the buffer pointer value to decide 
whether to enable its dependency handling.

Ciao
Stephan




Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-02-01 Thread Stephan Mueller
Am Donnerstag, 1. Februar 2018, 10:35:07 CET schrieb Gilad Ben-Yossef:

Hi Gilad,

> Hi,
> 
> On Wed, Jan 31, 2018 at 2:29 PM, Jonathan Cameron
> 
>  wrote:
> > On Tue, 30 Jan 2018 15:51:40 +
> > 
> > Jonathan Cameron  wrote:
> >> On Tue, 30 Jan 2018 09:27:04 +0100
> > 
> >> Stephan Müller  wrote:
> > A few clarifications from me after discussions with Stephan this morning.
> > 
> >> > Hi Harsh,
> >> > 
> >> > may I as you to try the attached patch on your Chelsio driver? Please
> >> > invoke the following command from libkcapi:
> >> > 
> >> > kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k
> >> > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i
> >> > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
> >> > 
> >> > The result should now be a fully block-chained result.
> >> > 
> >> > Note, the patch goes on top of the inline IV patch.
> >> > 
> >> > Thanks
> >> > 
> >> > ---8<---
> >> > 
> >> > In case of AIO where multiple IOCBs are sent to the kernel and inline
> >> > IV
> >> > is not used, the ctx->iv is used for each IOCB. The ctx->iv is read for
> >> > the crypto operation and written back after the crypto operation. Thus,
> >> > processing of multiple IOCBs must be serialized.
> >> > 
> >> > As the AF_ALG interface is used by user space, a mutex provides the
> >> > serialization which puts the caller to sleep in case a previous IOCB
> >> > processing is not yet finished.
> >> > 
> >> > If multiple IOCBs arrive that all are blocked, the mutex' FIFO
> >> > operation
> >> > of processing arriving requests ensures that the blocked IOCBs are
> >> > unblocked in the right order.
> >> > 
> >> > Signed-off-by: Stephan Mueller 
> > 
> > Firstly, as far as I can tell (not tested it yet) the patch does the
> > job and is about the best we can easily do in the AF_ALG code.
> > 
> > I'd suggest that this (or v2 anyway) is stable material as well
> > (which, as Stephan observed, will require reordering of the two patches).
> > 
> >> As a reference point, holding outside the kernel (in at least
> >> the case of our hardware with 8K CBC) drops us down to around 30%
> >> of performance with separate IVs.  Putting a queue in kernel
> >> (and hence allowing most setup of descriptors / DMA etc) gets us
> >> up to 50% of raw non chained performance.
> >> 
> >> So whilst this will work in principle I suspect anyone who
> >> cares about the cost will be looking at adding their own
> >> software queuing and dependency handling in driver.  How
> >> much that matters will depend on just how quick the hardware
> >> is vs overheads.
> 
> 
> 
> > The differences are better shown with pictures...
> > To compare the two approaches. If we break up the data flow the
> > alternatives are
> > 
> > 1) Mutex causes queuing in AF ALG code
> > 
> > The key thing is the [Build / Map HW Descs] for small packets,
> > up to perhaps 32K, is a significant task we can't avoid.
> > At 8k it looks like it takes perhaps 20-30% of the time
> > (though I only have end to end performance numbers so far)
> > 
> > 
> > [REQUEST 1 from userspace]
> > 
> >|  [REQUEST 2 from userspace]
> > 
> > [AF_ALG/SOCKET]   |
> > 
> >|   [AF_ALG/SOCKET]
> > 
> > NOTHING BLOCKING (lock mut)   |
> > 
> >|Queued on Mutex
> >  
> >  [BUILD / MAP HW DESCS]   |
> >  
> >[Place in HW Queue]|
> >
> >[Process]  |
> >   
> >   [Interrupt] |
> >  
> >  [Respond and update IV] Nothing Blocking (lock mut)
> >  
> >|[BUILD / MAP HW DESCS] * AFTER
> >|SERIALIZATION *
> >  
> >  Don't care beyond here.  |
> >  
> >   [Place in HW Queue]
> >   
> >   [Process]
> >  
> >  [Interrupt]
> > 
> > [Respond and update IV]
> > 
> > Don't care beyond here.
> > 
> > 2) The queuing approach I did in the driver moves that serialization
> > to after the BUILD / MAP HW DESCs stage.
> > 
> > [REQUEST 1 from userspace]
> > 
> >|  [REQUEST 2 from userspace]
> > 
> > [AF_ALG/SOCKET]   |
> > 
> >|   [AF_ALG/SOCKET]
> >  
> >  [BUILD / MAP HW DESCS]   |
> >  
> >| [BUILD / MAP HW DESCS] * BEFORE
> >| SERIALIZATION *
> >  
> >  NOTHING BLOCKING |
> >  
> 

Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-02-01 Thread Gilad Ben-Yossef
Hi,

On Wed, Jan 31, 2018 at 2:29 PM, Jonathan Cameron
 wrote:
> On Tue, 30 Jan 2018 15:51:40 +
> Jonathan Cameron  wrote:
>
>> On Tue, 30 Jan 2018 09:27:04 +0100
>> Stephan Müller  wrote:
>
> A few clarifications from me after discussions with Stephan this morning.
>
>>
>> > Hi Harsh,
>> >
>> > may I as you to try the attached patch on your Chelsio driver? Please 
>> > invoke
>> > the following command from libkcapi:
>> >
>> > kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k
>> > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i
>> > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
>> >
>> > The result should now be a fully block-chained result.
>> >
>> > Note, the patch goes on top of the inline IV patch.
>> >
>> > Thanks
>> >
>> > ---8<---
>> >
>> > In case of AIO where multiple IOCBs are sent to the kernel and inline IV
>> > is not used, the ctx->iv is used for each IOCB. The ctx->iv is read for
>> > the crypto operation and written back after the crypto operation. Thus,
>> > processing of multiple IOCBs must be serialized.
>> >
>> > As the AF_ALG interface is used by user space, a mutex provides the
>> > serialization which puts the caller to sleep in case a previous IOCB
>> > processing is not yet finished.
>> >
>> > If multiple IOCBs arrive that all are blocked, the mutex' FIFO operation
>> > of processing arriving requests ensures that the blocked IOCBs are
>> > unblocked in the right order.
>> >
>> > Signed-off-by: Stephan Mueller 
>
> Firstly, as far as I can tell (not tested it yet) the patch does the
> job and is about the best we can easily do in the AF_ALG code.
>
> I'd suggest that this (or v2 anyway) is stable material as well
> (which, as Stephan observed, will require reordering of the two patches).
>
>>
>> As a reference point, holding outside the kernel (in at least
>> the case of our hardware with 8K CBC) drops us down to around 30%
>> of performance with separate IVs.  Putting a queue in kernel
>> (and hence allowing most setup of descriptors / DMA etc) gets us
>> up to 50% of raw non chained performance.
>>
>> So whilst this will work in principle I suspect anyone who
>> cares about the cost will be looking at adding their own
>> software queuing and dependency handling in driver.  How
>> much that matters will depend on just how quick the hardware
>> is vs overheads.
>>



>
> The differences are better shown with pictures...
> To compare the two approaches. If we break up the data flow the
> alternatives are
>
> 1) Mutex causes queuing in AF ALG code
>
> The key thing is the [Build / Map HW Descs] for small packets,
> up to perhaps 32K, is a significant task we can't avoid.
> At 8k it looks like it takes perhaps 20-30% of the time
> (though I only have end to end performance numbers so far)
>
>
> [REQUEST 1 from userspace]
>|  [REQUEST 2 from userspace]
> [AF_ALG/SOCKET]   |
>|   [AF_ALG/SOCKET]
> NOTHING BLOCKING (lock mut)   |
>|Queued on Mutex
>  [BUILD / MAP HW DESCS]   |
>|  |
>[Place in HW Queue]|
>|  |
>[Process]  |
>|  |
>   [Interrupt] |
>|  |
>  [Respond and update IV] Nothing Blocking (lock mut)
>|[BUILD / MAP HW DESCS] * AFTER SERIALIZATION *
>  Don't care beyond here.  |
>   [Place in HW Queue]
>   |
>   [Process]
>   |
>  [Interrupt]
>   |
> [Respond and update IV]
>   |
> Don't care beyond here.
>
>
> 2) The queuing approach I did in the driver moves that serialization
> to after the BUILD / MAP HW DESCs stage.
>
> [REQUEST 1 from userspace]
>|  [REQUEST 2 from userspace]
> [AF_ALG/SOCKET]   |
>|   [AF_ALG/SOCKET]
>  [BUILD / MAP HW DESCS]   |
>| [BUILD / MAP HW DESCS] * BEFORE 
> SERIALIZATION *
>  NOTHING BLOCKING |
>|   IV in flight (enqueue sw queue)
>[Place in HW Queue]|
>|  |
>[Process]  |
>|  |
>   [Interrupt] |
>|   

Re: [PATCH v2 0/4] crypto: aesni - Use zero-copy for gcm(aes) buffers that are partially contiguous

2018-02-01 Thread Steffen Klassert
On Wed, Jan 31, 2018 at 10:54:57AM -0800, Junaid Shahid wrote:
> On Wed, Jan 31, 2018 at 12:13 AM, Steffen Klassert
>  wrote:
> >
> > I wonder which special usecase you have in mind that will be improved
> > by your patches.
> >
> 
> This is not related to IPsec. We have an internal use case where the
> data buffer itself is a single memory page but the authentication tag
> needs to be in a separate buffer. This patch set allows us to have
> zero copy in that case.

We usually don't add code for 'internal use cases' to the mainline
kernel. If you need a SG version of gcm-aesni, then please implement
the generic version so that everybody can benefit from it.

Think about how this will look like if everybody would add
a workaround for its very special use case.