Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Theodore Ts'o
On Sat, Jun 03, 2017 at 01:58:25AM +0200, Jason A. Donenfeld wrote:
> Hi Ted,
> 
> Based on the tone of your last email, before I respond to your
> individual points

May I gently point out that *your* tone that started this whole thread
has been pretty terrible?   Quoting your your first message:

   "Yes, yes, you have arguments for why you're keeping this
   pathological, but you're still wrong, and this api is still a bug."

This kind of "my shit doesn't stink, but yours does", is not
particularly useful if you are trying to have a constructive
discussion.  If you think the /dev/urandom question wasn't appropriate
to discuss in this thread, then why the *hell* did you bring it up
*first*?

The reason why I keep harping on this is because I'm concerned about
an absolutist attitude towards technical design, where the good is the
enemy of the perfect, and where bricking machines in the pursuit of
cryptographic perfection is considered laudable.  Yes, if you apply
thermite to the CPU and the hard drives, the system will be secure.
But it also won't be terribly useful.

And to me, the phrase "you're still wrong" doesn't seem to admit any
concessions towards acknowledging that there might be another side to
the security versus usability debate.  And I'd ***really*** rather not
waste my time trying to work with someone who doesn't care about
usability, because my focus is in practical engineering, which means
if no one uses the system, or can use the system, then I consider it a
failure, even if it is 100% secure.  And this is true no matter
whether we're talking about /dev/urandom or some enhancement to
get_random_bytes().

> Do you have any opinions on the issue of what the most flexible API
> would be? There are a lot of varieties of wait_event. The default one
> is uninterruptable, but as Stephan and Herbert saw when they were
> working on this was that it could be dangerous in certain
> circumstances not to allow interruption. So probably we'd want an
> interruptable variety. But then maybe we want to support the other
> wait_event_* modes too, like killable, timeout, hrtimeout, io, and so
> on. There's a huge list. These are all implemented as macros in
> wait.h, and I don't really want to have a different get_random_bytes_*
> function that corresponds to _each_ of these wait_event_* functions,
> since that'd be overwhelming.

We're going to have to look at a representative sample of the call
sites to figure this out.  The simple case is where the call site is
only run in response to a userspace system call.  There, blocking
makes perfect sense.  I'm just not sure there are many callers of
get_random_ bytes() where this is the case.

There are definitely quite a few callsites of get_random_bytes() where
the caller is in the middle of an interrupt service routine, or is
holding spinlock.  There, we are *not* allowed to block.  So any kind
of blocking interface isn't going to be allowed; the async callback is
the only thing which is guaranteed to work, in fact.  Mercifully, many
of these are cases where prandom_u32() makes sense.

> So I was thinking maybe a simple flag and a timeout param could work:

When would a timeout be useful?  If you are using get_random_bytes()
for security reasons, does the security reason go away after 15
seconds?  Or even 30 seconds?

> > Or maybe we can then help figure out what percentage of the callsites
> > can be fixed with a synchronous interface, and fix some number of them
> > just to demonstrate that the synchronous interface does work well.
> 
> I was planning on doing this to at least a couple callsites in my
> upcoming patch series that adds the synchronous interface. It seems
> like the right way to see if the API is good or not.

This is absolutely the right approach.  See my above comments about
why there may not be that many places where a synchronous interface
will work out.  And I'm going to be rather surprised if there are
places where having a timeout parameter will be helpful.  But I could
be wrong; let's see if we can find cases where we (a) need secure
bytes, (b) are allowed to block, and (c) where a timeout would make
sense.

> Right. In my initial email I proposed a distinction:
> 
> - External module loading is usually in a different process context,
> so multiple modules can be attempted to be loaded at the same time. In
> this case, it is probably safe to block in request_module.
> - Built-in modules are loaded ± linearly, from init/main.c, so these
> really can't block each other. For this, I was thinking of introducing
> an rnginit section, to add to the list of things like lateinit. The
> rnginit drivers would be loaded _after_ the RNG has initialized, so
> that they don't get blocked.

... except that we already have an order in which drivers are added,
so moving a driver to the rnginit section might mean that some kernel
module that depends on that driver being loaded also needs to be moved
after rnginit.  We already have the following 

[PATCH RFC 3/3] random: warn when kernel uses unseeded randomness

2017-06-02 Thread Jason A. Donenfeld
This enables an important dmesg notification about when drivers have
used the crng without it being seeded first. Prior, these errors would
occur silently, and so there hasn't been a great way of diagnosing these
types of bugs for obscure setups. By adding this as a config option, we
can leave it on by default, so that we learn where these issues happen,
in the field, will still allowing some people to turn it off, if they
really know what they're doing and do not want the log entries.

Signed-off-by: Jason A. Donenfeld 
---
 drivers/char/random.c |  3 +--
 lib/Kconfig.debug | 15 +++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index bee7b1349bcb..f64844383d86 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -285,7 +285,6 @@
 #define SEC_XFER_SIZE  512
 #define EXTRACT_SIZE   10
 
-#define DEBUG_RANDOM_BOOT 0
 
 #define LONGS(x) (((x) + sizeof(unsigned long) - 1)/sizeof(unsigned long))
 
@@ -1475,7 +1474,7 @@ void get_random_bytes(void *buf, int nbytes)
 {
__u8 tmp[CHACHA20_BLOCK_SIZE];
 
-#if DEBUG_RANDOM_BOOT > 0
+#ifdef CONFIG_WARN_UNSEEDED_RANDOM
if (!crng_ready())
printk(KERN_NOTICE "random: %pF get_random_bytes called "
   "with crng_init = %d\n", (void *) _RET_IP_, crng_init);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e4587ebe52c7..fd5e67bcd46c 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1209,6 +1209,21 @@ config STACKTRACE
  It is also used by various kernel debugging features that require
  stack trace generation.
 
+config WARN_UNSEEDED_RANDOM
+   bool "Warn when kernel uses unseeded randomness"
+   default y
+   help
+ Some parts of the kernel contain bugs relating to their use of
+ cryptographically secure random numbers before it's actually possible
+ to generate those numbers securely. This setting ensures that these
+ flaws don't go unnoticed, by enabling a message, should this ever
+ occur. This will allow people with obscure setups to know when things
+ are going wrong, so that they might contact developers about fixing
+ it.
+
+ Say Y here, unless you simply do not care about using unseeded
+ randomness and do not want a potential warning message in your logs.
+
 config DEBUG_KOBJECT
bool "kobject debugging"
depends on DEBUG_KERNEL
-- 
2.13.0



[PATCH RFC 1/3] random: add synchronous API for the urandom pool

2017-06-02 Thread Jason A. Donenfeld
This enables users of get_random_{bytes,u32,u64,int,long} to wait until
the pool is ready before using this function, in case they actually want
to have reliable randomness.

Signed-off-by: Jason A. Donenfeld 
---
 drivers/char/random.c  | 46 --
 include/linux/random.h |  1 +
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0ab024918907..bee7b1349bcb 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1466,7 +1466,10 @@ static ssize_t extract_entropy_user(struct entropy_store 
*r, void __user *buf,
  * number of good random numbers, suitable for key generation, seeding
  * TCP sequence numbers, etc.  It does not rely on the hardware random
  * number generator.  For random bytes direct from the hardware RNG
- * (when available), use get_random_bytes_arch().
+ * (when available), use get_random_bytes_arch(). In order to ensure
+ * that the randomness provided by this function is okay, the function
+ * wait_for_random_bytes() should be called and return 0 at least once
+ * at any point prior.
  */
 void get_random_bytes(void *buf, int nbytes)
 {
@@ -1496,6 +1499,42 @@ void get_random_bytes(void *buf, int nbytes)
 EXPORT_SYMBOL(get_random_bytes);
 
 /*
+ * Wait for the urandom pool to be seeded and thus guaranteed to supply
+ * cryptographically secure random numbers. This applies to: the /dev/urandom
+ * device, the get_random_bytes function, and the get_random_{u32,u64,int,long}
+ * family of functions. Using any of these functions without first calling
+ * this function forfeits the guarantee of security.
+ *
+ * If is_interruptable is true, then this uses wait_event_interruptable.
+ * Otherwise the calling process will not respond to signals. If timeout is
+ * 0, then, unless interrupted, this function will wait potentially forever.
+ * Otherwise, timeout is a measure in jiffies.
+ *
+ * Returns: 0 if the urandom pool has been seeded.
+ *  -ERESTARTSYS if the function was interrupted or timed out.
+ *  Other return values of the wait_event_* functions.
+ */
+int wait_for_random_bytes(bool is_interruptable, unsigned long timeout)
+{
+   if (likely(crng_ready()))
+   return 0;
+   if (is_interruptable && timeout)
+   return wait_event_interruptible_timeout(crng_init_wait, 
crng_ready(), timeout);
+   if (is_interruptable && !timeout)
+   return wait_event_interruptible(crng_init_wait, crng_ready());
+   if (!is_interruptable && timeout)
+   return wait_event_timeout(crng_init_wait, crng_ready(), 
timeout);
+   if (!is_interruptable && !timeout) {
+   wait_event(crng_init_wait, crng_ready());
+   return 0;
+   }
+
+   BUG(); /* This BUG() should be compiled out as unreachable code, but 
just in case... */
+   return -EINVAL;
+}
+EXPORT_SYMBOL(wait_for_random_bytes);
+
+/*
  * Add a callback function that will be invoked when the nonblocking
  * pool is initialised.
  *
@@ -2023,7 +2062,10 @@ struct batched_entropy {
 /*
  * Get a random word for internal kernel use only. The quality of the random
  * number is either as good as RDRAND or as good as /dev/urandom, with the
- * goal of being quite fast and not depleting entropy.
+ * goal of being quite fast and not depleting entropy. In order to ensure
+ * that the randomness provided by this function is okay, the function
+ * wait_for_random_bytes() should be called and return 0 at least once
+ * at any point prior.
  */
 static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64);
 u64 get_random_u64(void)
diff --git a/include/linux/random.h b/include/linux/random.h
index ed5c3838780d..20dd73418bd5 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -34,6 +34,7 @@ extern void add_input_randomness(unsigned int type, unsigned 
int code,
 extern void add_interrupt_randomness(int irq, int irq_flags) __latent_entropy;
 
 extern void get_random_bytes(void *buf, int nbytes);
+extern int wait_for_random_bytes(bool is_interruptable, unsigned long timeout);
 extern int add_random_ready_callback(struct random_ready_callback *rdy);
 extern void del_random_ready_callback(struct random_ready_callback *rdy);
 extern void get_random_bytes_arch(void *buf, int nbytes);
-- 
2.13.0



[PATCH RFC 2/3] random: add get_random_{bytes,u32,u64,int,long}_wait family

2017-06-02 Thread Jason A. Donenfeld
These functions are simple convience wrappers that call
wait_for_random_bytes before calling the respective get_random_*
function.

Signed-off-by: Jason A. Donenfeld 
---
 include/linux/random.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/include/linux/random.h b/include/linux/random.h
index 20dd73418bd5..6a19da815ff1 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -58,6 +58,36 @@ static inline unsigned long get_random_long(void)
 #endif
 }
 
+/* Calls wait_for_random_bytes(is_interruptable, timeout) and then
+ * calls get_random_bytes(buf, nbytes). Returns the result of the
+ * call to wait_for_random_bytes.
+ */
+static inline int get_random_bytes_wait(void *buf, int nbytes,
+   bool is_interruptable, unsigned long timeout)
+{
+   int ret = wait_for_random_bytes(is_interruptable, timeout);
+   if (unlikely(ret))
+   return ret;
+   get_random_bytes(buf, nbytes);
+   return 0;
+}
+
+#define declare_get_random_var_wait(var) \
+   static inline int get_random_ ## var ## _wait(var *out, \
+   bool is_interruptable, unsigned long timeout) { \
+   int ret = wait_for_random_bytes(is_interruptable, timeout); \
+   if (unlikely(ret)) \
+   return ret; \
+   *out = get_random_ ## var(); \
+   return 0; \
+   }
+declare_get_random_var_wait(u32)
+declare_get_random_var_wait(u64)
+declare_get_random_var_wait(int)
+declare_get_random_var_wait(long)
+#undef declare_get_random_var
+
+
 unsigned long randomize_page(unsigned long start, unsigned long range);
 
 u32 prandom_u32(void);
-- 
2.13.0



[PATCH RFC 0/3] get_random_bytes seed blocking

2017-06-02 Thread Jason A. Donenfeld
Per the other thread on this mailing list, here's an initial
stab at what we discussed -- adding a blocking API for the RNG,
and adding a default-on dmesg Kconfig value for when things go
wrong.

Let me know what you think of this general implementation strategy,
and if you like it, I'll move forward with polish and with integrating
it into a fix for a few currently buggy get_random_bytes use cases.

Jason A. Donenfeld (3):
  random: add synchronous API for the urandom pool
  random: add get_random_{bytes,u32,u64,int,long}_wait family
  random: warn when kernel uses unseeded randomness

 drivers/char/random.c  | 49 +
 include/linux/random.h | 31 +++
 lib/Kconfig.debug  | 15 +++
 3 files changed, 91 insertions(+), 4 deletions(-)

-- 
2.13.0



Re: [kernel-hardening] Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Sandy Harris
The only sensible & general solution for the initialisation problem
that I have seen is John Denker's.
http://www.av8n.com/computer/htm/secure-random.htm#sec-boot-image

If I read that right, it would require only minor kernel changes &
none to the API Ted & others are worrying about. It would be secure
except against an enemy who can read your kernel image or interfere
with your install process. Assuming permissions are set sensibly, that
means an enemy who already has root & such an enemy has lots of much
easier ways to break things, so we need not worry about that case.

The difficulty is that it would require significant changes to
installation scripts. Still, since it is a general solution to a real
problem, it might be better to implement that rather than work on the
other suggestions in the thread.


Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Jason A. Donenfeld
Hi Ted,

Based on the tone of your last email, before I respond to your
individual points, I think it's worth noting that the intent of this
thread is to get a sampling of opinions of the issue of
get_random_bytes, so that I can write a patch that fixes this issue
(or a series of issues) using some strategy that we agree is a good
one. I've already implemented prototypes of a few different ideas, so
that I can discuss them competently here (a personal thing, I tend to
learn best by "doing"), and so with list feedback, I'll know in which
direction I should go for refinement eventually leading to a [PATCH]
submission. In case it wasn't evident from my last patch series for
drivers/char/random.c, despite not being corporately backed, I really
have a fun time working on this part of the kernel in my freetime.
With that said...

On Fri, Jun 2, 2017 at 9:07 PM, Theodore Ts'o  wrote:
> It's not _my_ API, it's *our* API
> much don't break backwards compatibility,
> broken, and they complain, and you tell them to suck it,
> If you are using the word *you*, and speaking as an outside to the
> community, they you can kvetch all you like.  But you're an outsider,
> take your complaints seriously.

As I wrote earlier, if you want to bike-shed the technical aspects and
political aspects of fixing /dev/urandom, we can do that in another
thread. It's obviously a complex topic worthy of discussion, and I'd
like that discussion to be technically worthy of the issues at hand,
some of which you've raised here. So please, start a new topic for
that, and I'll happily follow up, all with the intent of writing a
patch series and running compatibility tests and whatnot; as I said
before, I like working on this stuff. However, the focus of this
thread is get_random_bytes, so please try to stay focused, lest this
get derailed.

So, you wrote:

> And if your answer is just, "blah blah di blah blah", don't be
> surprised if others respond to you in exactly the same way.
> Specifically, by saying to you (in your words), "I don't care".

I really don't care about bikeshedding this with you here, now, in
this thread. I care about solving the get_random_bytes situation.

>> While we're on the topic of that, you might consider adding a simple
>> synchronous interface.
>
> There's that word "you" again

Yea, whatever, Ted. Can I rephrase? "As the maintainer of
drivers/char/random.c, you might consider adding a synchronous
interface so that the consumers of the file you maintain can benefit
from it; alternatively, and perhaps easier for you, express your
support for such an idea, and I'll gladly write the patch." Better? I
ask you for the benefit of the doubt that this is what I had in mind
when I wrote "you" -- that I'm seeking approval of the idea before
moving my code past prototype stage. You're the boss, so whether I
write the patch or you write the patch or someone else writes the
patch, it really _is_ you whose consideration matters the most.

> This is open source --- want to send patches?  It sounds like it's a
> workable, good idea.

Awesome, happy you like it. Yes, absolutely, I'll start cleaning
things up and will send a series.

Do you have any opinions on the issue of what the most flexible API
would be? There are a lot of varieties of wait_event. The default one
is uninterruptable, but as Stephan and Herbert saw when they were
working on this was that it could be dangerous in certain
circumstances not to allow interruption. So probably we'd want an
interruptable variety. But then maybe we want to support the other
wait_event_* modes too, like killable, timeout, hrtimeout, io, and so
on. There's a huge list. These are all implemented as macros in
wait.h, and I don't really want to have a different get_random_bytes_*
function that corresponds to _each_ of these wait_event_* functions,
since that'd be overwhelming. So I was thinking maybe a simple flag
and a timeout param could work:

int get_random_bytes_seeded(u8 *buf, size_t len, bool
is_interruptable, unsigned long timeout);

If timeout==0, the timeout is infinite. If in_interruptible is true,
we use wait_event_interruptible_*, otherwise we use wait_event_*, and
the ret value is the usual return value from wait_event_*.

Does that seem like a good simplification that will cover most use
cases? We could of course add exceeding complexity and choice, but I
was thinking this would cover most use cases, at least to begin.

What do you think?

> ...or that I disagree with your prior point.  I think you're being
> lazy, and trying to make it someone else's problem and standing on the
> side lines and complaining, as opposed to trying to help solve the
> problem.

Um, no, what an offensive insinuation. I'm trying to solve the
problem. I'd like your honest feedback, since your maintainer, before
I start fixing this.

> No, of course we can't audit all of the code, but it's probably a good
> idea to take a random sample, and to analyze them, so we can get a
> sense 

[PATCH] Staging: ccree: ssi_aead.h: Fixed a pointer declaration error.

2017-06-02 Thread srishti sharma
Fixed a pointer declaration error , the dereferencing operator was misplaced.

Signed-off-by: srishti sharma 
---
 drivers/staging/ccree/ssi_aead.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/ccree/ssi_aead.h b/drivers/staging/ccree/ssi_aead.h
index 654a181..9cf5225 100644
--- a/drivers/staging/ccree/ssi_aead.h
+++ b/drivers/staging/ccree/ssi_aead.h
@@ -97,8 +97,8 @@ struct aead_req_ctx {
struct ssi_mlli assoc;
struct ssi_mlli src;
struct ssi_mlli dst;
-   struct scatterlist* srcSgl;
-   struct scatterlist* dstSgl;
+   struct scatterlist *srcSgl;
+   struct scatterlist *dstSgl;
unsigned int srcOffset;
unsigned int dstOffset;
enum ssi_req_dma_buf_type assoc_buff_type;
--
2.7.4


Re: crypto: Work around deallocated stack frame reference gcc bug on sparc.

2017-06-02 Thread David Miller
From: David Miller 
Date: Fri, 02 Jun 2017 14:39:06 -0400 (EDT)

> From: "Darrick J. Wong" 
> Date: Fri, 2 Jun 2017 11:08:08 -0700
> 
>> ext4/jbd2's crc32c implementations will also need a fix like this for
>> {ext4,jbd2}_chksum.  Note that both of these modules call the crypto api
>> directly to avoid a static dependence on libcrc32c; this was done to
>> reduce kernel footprint for applications that don't need it.  (ext2,
>> ext3, and ext4 before the metadata_csum feature existed).
> 
> Good catch.  I even looked at that code the other day so should
> have spotted it as well.
> 
> I'll integrate fixes for those into my patch when I get a chance,
> thanks!

Actually, ext4 doesn't trigger the problem because the on-stack object
used in ext4 is a fixed size at compile time.  Which is technically an
ill-advised assumption to make.  Even the generic libcrc32c.c doesn't
assume that the context area is 4 bytes for crc32c.

Anyways, back to the main point, the gcc bug only triggers when
alloca() like constructs are used.

That's why I scanned for SHASH_DESC_ON_STACK() to see exactly where
the workaround is necessary.


Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Theodore Ts'o
On Fri, Jun 02, 2017 at 07:44:04PM +0200, Jason A. Donenfeld wrote:
> On Fri, Jun 2, 2017 at 7:26 PM, Theodore Ts'o  wrote:
> > I tried making /dev/urandom block.
> > So if you're a security focused individual who is kvetching
> > And if we're breaking
> 
> Yes yes, bla bla, predictable response. I don't care. Your API is
> still broken. Excuses excuses. Yes, somebody needs to do the work in
> the end, maybe that person can be me, maybe you, maybe somebody else.

It's not _my_ API, it's *our* API --- that is the Linux kernel
community's.  And part of the rules of this community is that we very
much don't break backwards compatibility, unless there is a really
good reason, where Linus gets to decide if it's a really good reason.
So if you care a lot about this issue, then you need to do the work to
make the change, and part of it is showing, to a high degree of
certainty, that it won't break backwards compatibility.

Because if you don't, and you flout community norms, and users get
broken, and they complain, and you tell them to suck it, then Linus
will pull out is patented clue stick, and tell you that you have in
fact flouted community norms, correct you publically, and then revert
your change.

If you are using the word *you*, and speaking as an outside to the
community, they you can kvetch all you like.  But you're an outsider,
and don't have to listen to you.  But if you want to make a positive
difference here, and you're passionate about it --- this is you would
need to do.  That being said, we're all volunteers, so if you don't
want to bother, that's fine.  But then don't be surprised if we don't
take your complaints seriously.

> While we're on the topic of that, you might consider adding a simple
> synchronous interface.

There's that word "you" again

> I realize that the get_blocking_random_bytes
> attempt was aborted as soon as it began, because of issues of
> cancelability, but you could just expose the usual array of wait,
> wait_interruptable, wait_killable, etc, or just make that wait object
> and condition non-static so others can use it as needed. Having to
> wrap the current asynchronous API like this kludge is kind of a
> downer:

This is open source --- want to send patches?  It sounds like it's a
workable, good idea.

> No, what it means is that the particularities of individual examples I
> picked at random don't matter. Are we really going to take the time to
> audit each and every callsite to see "do they need actually random
> data? can this be called pre-userspace?" I mentioned this in my
> initial email. As I said there, I think analyzing all the cases one by
> one is fragile, and more will pop up, and that's not really the right
> way to approach this. And furthermore, as alluded to above, even
> fixing clearly-broken places means using that hard-to-use asynchronous
> API, which adds even more potentially buggy TCB to these drivers and
> all the rest. Not a good strategy.
> 
> Seeing as you took the time to actually respond to the
> _particularities_ of each individual random example I picked could
> indicate that you've missed this point prior.

...or that I disagree with your prior point.  I think you're being
lazy, and trying to make it someone else's problem and standing on the
side lines and complaining, as opposed to trying to help solve the
problem.

No, of course we can't audit all of the code, but it's probably a good
idea to take a random sample, and to analyze them, so we can get a
sense of what the issues are.  And then maybe we can find a way to
quickly find a class of users that can be easily fixed by using
prandom_u32() (for example).

Or maybe we can then help figure out what percentage of the callsites
can be fixed with a synchronous interface, and fix some number of them
just to demonstrate that the synchronous interface does work well.

> Right, it was him and Stephan (CCd). They initially started by adding
> get_blocking_random_bytes, but then replaced this with the
> asynchronous one, because they realized it could block forever. As I
> said above, though, I still think a blocking API would be useful,
> perhaps just with more granularity for the way in which it blocks.

It depends on where it's being used.  If it's part of module load,
especially if it's one that's done automatically, having something
that blocks forever might not be all that useful.  Especially if it
blocks device drivers from being albe to be initialized enough to
actually supply entropy to the whole system.

Or maybe (in the case of stack canaries), the answer is we should
start with crappy random numbers, but then once the random number
generator has been initialized, we can use the callback to get
cryptographically secure random number generators, and then we need to
figure out how to phase out use of the old crappy random numbers and
substitute in the exclusive use of the good random numbers.  Because
saying that we'll just simply not allow any processes to start 

Re: [kernel-hardening] Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Kees Cook
On Fri, Jun 2, 2017 at 10:41 AM, Daniel Micay  wrote:
> On Fri, 2017-06-02 at 17:53 +0200, Jason A. Donenfeld wrote:
>> (Meanwhile...)
>>
>> In my own code, I'm currently playing with a workaround that looks
>> like this:
>>
>> --- a/src/main.c
>> +++ b/src/main.c
>>
>> +#include 
>> +#include 
>>
>> +struct rng_initializer {
>> +   struct completion done;
>> +   struct random_ready_callback cb;
>> +};
>> +static void rng_initialized_callback(struct random_ready_callback
>> *cb)
>> +{
>> +   complete(_of(cb, struct rng_initializer, cb)->done);
>> +}
>> +
>> static int __init mod_init(void)
>> {
>>int ret;
>> +   struct rng_initializer rng = {
>> +   .done = COMPLETION_INITIALIZER(rng.done),
>> +   .cb = { .owner = THIS_MODULE, .func =
>> rng_initialized_callback }
>> +   };
>> +
>> +   ret = add_random_ready_callback();
>> +   if (!ret)
>> +   wait_for_completion();
>> +   else if (ret != -EALREADY)
>> +   return ret;
>>
>>do_things_with_get_random_bytes_maybe();
>>
>> Depending on the situation, however, I could imagine that
>> wait_for_completion never returning, if its blocking activity that
>> contributes to the seed actually being available, if this is called
>> from a compiled-in module, so I find this a bit sub-optimal...
>
> One of the early uses is initializing the stack canary value for SSP in
> very early boot. If that blocks, it's going to be blocking nearly
> anything else from happening.
>
> On x86, that's only the initial canary since the per-task canaries end
> up being used, but elsewhere at least without SMP disabled or changes to
> GCC that's all there is so the entropy matters.

And just to note, building with GCC_PLUGIN_LATENT_ENTROPY, while it
(correctly) doesn't credit entropy to the pool, should at least make
the pool less deterministic between boots.

-Kees

-- 
Kees Cook
Pixel Security


Re: crypto: Work around deallocated stack frame reference gcc bug on sparc.

2017-06-02 Thread David Miller
From: "Darrick J. Wong" 
Date: Fri, 2 Jun 2017 11:08:08 -0700

> ext4/jbd2's crc32c implementations will also need a fix like this for
> {ext4,jbd2}_chksum.  Note that both of these modules call the crypto api
> directly to avoid a static dependence on libcrc32c; this was done to
> reduce kernel footprint for applications that don't need it.  (ext2,
> ext3, and ext4 before the metadata_csum feature existed).

Good catch.  I even looked at that code the other day so should
have spotted it as well.

I'll integrate fixes for those into my patch when I get a chance,
thanks!



Re: crypto: Work around deallocated stack frame reference gcc bug on sparc.

2017-06-02 Thread Darrick J. Wong
[add ext4 list to cc]

On Fri, Jun 02, 2017 at 11:28:54AM -0400, David Miller wrote:
> 
> On sparc, if we have an alloca() like situation, as is the case with
> SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack
> memory.  The result can be that the value is clobbered if a trap
> or interrupt arrives at just the right instruction.
> 
> It only occurs if the function ends returning a value from that
> alloca() area and that value can be placed into the return value
> register using a single instruction.
> 
> For example, in lib/libcrc32c.c:crc32c() we end up with a return
> sequence like:
> 
> return  %i7+8
>  lduw   [%o5+16], %o0   ! MEM[(u32 *)__shash_desc.1_10 + 16B],
> 
> %o5 holds the base of the on-stack area allocated for the shash
> descriptor.  But the return released the stack frame and the
> register window.
> 
> So if an intererupt arrives between 'return' and 'lduw', then
> the value read at %o5+16 can be corrupted.
> 
> Add a data compiler barrier to work around this problem.  This is
> exactly what the gcc fix will end up doing as well, and it absolutely
> should not change the code generated for other cpus (unless gcc
> on them has the same bug :-)
> 
> With crucial insight from Eric Sandeen.
> 
> Reported-by: Anatoly Pugachev 
> Signed-off-by: David S. Miller 
> ---
> 
> See the thread anchored at:
> 
>   http://marc.info/?l=linux-sparc=149623182616944=2
> 
> for discussion, it has a reproducer module.  The problem was
> first noticed as occaisional XFS checksum corruptions.
> 
> Herbert, I don't expect you to like this but it is the best we can do
> I think.  It should not pessimize code on other architectures at all.
> I will work on fixing the gcc bug but it's been around forever and all
> versions are effected.
> 
> I noticed while working on this that at least btrfs duplicates the
> facilities provided by lib/libcrc32c.c and therefore should probably
> be converted over to straight crc32c() calls if possible.

ext4/jbd2's crc32c implementations will also need a fix like this for
{ext4,jbd2}_chksum.  Note that both of these modules call the crypto api
directly to avoid a static dependence on libcrc32c; this was done to
reduce kernel footprint for applications that don't need it.  (ext2,
ext3, and ext4 before the metadata_csum feature existed).

--D

> 
> Thanks!
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h
> index ecdba2f..1ac5b85 100644
> --- a/drivers/infiniband/sw/rxe/rxe.h
> +++ b/drivers/infiniband/sw/rxe/rxe.h
> @@ -68,6 +68,7 @@
>  static inline u32 rxe_crc32(struct rxe_dev *rxe,
>   u32 crc, void *next, size_t len)
>  {
> + u32 retval;
>   int err;
>  
>   SHASH_DESC_ON_STACK(shash, rxe->tfm);
> @@ -81,7 +82,9 @@ static inline u32 rxe_crc32(struct rxe_dev *rxe,
>   return crc32_le(crc, next, len);
>   }
>  
> - return *(u32 *)shash_desc_ctx(shash);
> + retval = *(u32 *)shash_desc_ctx(shash);
> + barrier_data(shash_desc_ctx(shash));
> + return retval;
>  }
>  
>  int rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu);
> diff --git a/fs/btrfs/hash.c b/fs/btrfs/hash.c
> index a97fdc1..baacc18 100644
> --- a/fs/btrfs/hash.c
> +++ b/fs/btrfs/hash.c
> @@ -38,6 +38,7 @@ u32 btrfs_crc32c(u32 crc, const void *address, unsigned int 
> length)
>  {
>   SHASH_DESC_ON_STACK(shash, tfm);
>   u32 *ctx = (u32 *)shash_desc_ctx(shash);
> + u32 retval;
>   int err;
>  
>   shash->tfm = tfm;
> @@ -47,5 +48,7 @@ u32 btrfs_crc32c(u32 crc, const void *address, unsigned int 
> length)
>   err = crypto_shash_update(shash, address, length);
>   BUG_ON(err);
>  
> - return *ctx;
> + retval = *ctx;
> + barrier_data(ctx);
> + return retval;
>  }
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 2185c7a..fd2e651 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1078,6 +1078,7 @@ static inline u32 f2fs_crc32(struct f2fs_sb_info *sbi, 
> const void *address,
>  {
>   SHASH_DESC_ON_STACK(shash, sbi->s_chksum_driver);
>   u32 *ctx = (u32 *)shash_desc_ctx(shash);
> + u32 retval;
>   int err;
>  
>   shash->tfm = sbi->s_chksum_driver;
> @@ -1087,7 +1088,9 @@ static inline u32 f2fs_crc32(struct f2fs_sb_info *sbi, 
> const void *address,
>   err = crypto_shash_update(shash, address, length);
>   BUG_ON(err);
>  
> - return *ctx;
> + retval = *ctx;
> + barrier_data(ctx);
> + return retval;
>  }
>  
>  static inline bool f2fs_crc_valid(struct f2fs_sb_info *sbi, __u32 blk_crc,
> diff --git a/lib/libcrc32c.c b/lib/libcrc32c.c
> index 74a54b7..9f79547 100644
> --- a/lib/libcrc32c.c
> +++ b/lib/libcrc32c.c
> @@ -43,7 +43,7 @@ static struct crypto_shash *tfm;
>  u32 crc32c(u32 crc, const void *address, unsigned int length)
>  {
>   SHASH_DESC_ON_STACK(shash, tfm);
> - u32 *ctx = (u32 *)shash_desc_ctx(shash);
> + u32 ret, *ctx 

Re: [kernel-hardening] Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Jason A. Donenfeld
On Fri, Jun 2, 2017 at 7:41 PM, Daniel Micay  wrote:
> One of the early uses is initializing the stack canary value for SSP in
> very early boot. If that blocks, it's going to be blocking nearly
> anything else from happening.
>
> On x86, that's only the initial canary since the per-task canaries end
> up being used, but elsewhere at least without SMP disabled or changes to
> GCC that's all there is so the entropy matters.

If this is the case, then we simply need a function called
get_random_bytes_but_potentially_crappy_ones_because_we_are_desperate_for_anything(),
which would respond with a weaker guarantee than that
get_random_bytes(), which the documentation says always returns
cryptographically secure numbers.


Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Jason A. Donenfeld
On Fri, Jun 2, 2017 at 7:26 PM, Theodore Ts'o  wrote:
> I tried making /dev/urandom block.
> So if you're a security focused individual who is kvetching
> And if we're breaking

Yes yes, bla bla, predictable response. I don't care. Your API is
still broken. Excuses excuses. Yes, somebody needs to do the work in
the end, maybe that person can be me, maybe you, maybe somebody else.
Regardless, we can take this to a different thread if you'd like to
bikeshed about that particular point for another five millennia. But
that isn't the main topic of this thread, so I'm not going to get
sucked into that blackhole.

> the time, we hadn't yet created the asynchronous interfaces that allow
> kernel code to do the right thing, even if it was massively
> inconvenient.

While we're on the topic of that, you might consider adding a simple
synchronous interface. I realize that the get_blocking_random_bytes
attempt was aborted as soon as it began, because of issues of
cancelability, but you could just expose the usual array of wait,
wait_interruptable, wait_killable, etc, or just make that wait object
and condition non-static so others can use it as needed. Having to
wrap the current asynchronous API like this kludge is kind of a
downer:

https://git.zx2c4.com/WireGuard/diff/src/config.c?id=c77145a8248dfc49e307eae7d7edd5fdca8d5559

> At this point, I think we should be completely open to making it be a
> config option, and if it looks like for common configs we're not
> spamming dmesg, we could even it make it be the default.  We just also
> need to give driver writers some easy-to-understand receipes to fix
> their drivers to do the right thing.  If we do that first, it's much
> more likely they will actually fix their kernel code, instead of just
> turning the warning off.

That's a good point. In my initial email I looked down on the
whack-a-mole approach, because nobody is ever going to analyze all of
those cases. But... if dmesgs are going wild for some people, we'll
have upset users doing the hard work for us. So maybe it's worthwhile
to turn that warning on by default.


> This is basically the exponential backoff which used in ethernet
> networks, and the *real* answer is that they should be using
> prandom_u32().
> I think in practice most IBM customers will be safe because they tend

No, what it means is that the particularities of individual examples I
picked at random don't matter. Are we really going to take the time to
audit each and every callsite to see "do they need actually random
data? can this be called pre-userspace?" I mentioned this in my
initial email. As I said there, I think analyzing all the cases one by
one is fragile, and more will pop up, and that's not really the right
way to approach this. And furthermore, as alluded to above, even
fixing clearly-broken places means using that hard-to-use asynchronous
API, which adds even more potentially buggy TCB to these drivers and
all the rest. Not a good strategy.

Seeing as you took the time to actually respond to the
_particularities_ of each individual random example I picked could
indicate that you've missed this point prior.

>> And so on and so on and so on. If you grep the source as I did, you'll
>> find there's no shortage of head-scratchers. "Hmmm," you ask yourself,
>> "could this be called before userspace has ensured that /dev/urandom
>> is seeded? do we actually _need_ good randomness here, or does it not
>> matter?" I guess you could try to reason about each and every one of
>> them -- you might even have those same questions about the silly
>> examples I pasted above -- but that one-by-one methodology seems
>> excessively fragile and arduous.
>
> This is a fair point.

Glad we're on the same page about that.


> That developer was Herbert Xu, and he added it about two years ago.
> See commit 205a525c3342.

Right, it was him and Stephan (CCd). They initially started by adding
get_blocking_random_bytes, but then replaced this with the
asynchronous one, because they realized it could block forever. As I
said above, though, I still think a blocking API would be useful,
perhaps just with more granularity for the way in which it blocks.

> So the problem with doing this by default, for all modules, is that on
> those platforms which don't have good hardware support for seeding the
> non-blocking pool quickly, if we defer all modules, we will also be
> deferring the means by which we get the entropy needed to seed the
> non-blocking pool.  So for example, if you have a bunch of networking
> drivers that are using get_random_bytes() for exponential backoff,
> when they *should* be using prandom_u32(), if we don't fix *that* bug,
> simply deferring the module load will also defer the entropy input
> from the networking interrupts from the networking card.

"We shouldn't fix legitimate cryptographic bugs, because the rest of
our codebase is too buggy to support anything reasonable."
Sounds like it'd be worthwhile to begin fixing 

Re: [kernel-hardening] Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Daniel Micay
On Fri, 2017-06-02 at 17:53 +0200, Jason A. Donenfeld wrote:
> (Meanwhile...)
> 
> In my own code, I'm currently playing with a workaround that looks
> like this:
> 
> --- a/src/main.c
> +++ b/src/main.c
> 
> +#include 
> +#include 
> 
> +struct rng_initializer {
> +   struct completion done;
> +   struct random_ready_callback cb;
> +};
> +static void rng_initialized_callback(struct random_ready_callback
> *cb)
> +{
> +   complete(_of(cb, struct rng_initializer, cb)->done);
> +}
> +
> static int __init mod_init(void)
> {
>int ret;
> +   struct rng_initializer rng = {
> +   .done = COMPLETION_INITIALIZER(rng.done),
> +   .cb = { .owner = THIS_MODULE, .func =
> rng_initialized_callback }
> +   };
> +
> +   ret = add_random_ready_callback();
> +   if (!ret)
> +   wait_for_completion();
> +   else if (ret != -EALREADY)
> +   return ret;
> 
>do_things_with_get_random_bytes_maybe();
> 
> Depending on the situation, however, I could imagine that
> wait_for_completion never returning, if its blocking activity that
> contributes to the seed actually being available, if this is called
> from a compiled-in module, so I find this a bit sub-optimal...

One of the early uses is initializing the stack canary value for SSP in
very early boot. If that blocks, it's going to be blocking nearly
anything else from happening.

On x86, that's only the initial canary since the per-task canaries end
up being used, but elsewhere at least without SMP disabled or changes to
GCC that's all there is so the entropy matters.


Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Theodore Ts'o
On Fri, Jun 02, 2017 at 04:59:56PM +0200, Jason A. Donenfeld wrote:
> /dev/urandom will return bad randomness before its seeded, rather than
> blocking, and despite years and years of discussion and bike shedding,
> this behavior hasn't changed, and the vast majority of cryptographers
> and security experts remain displeased. It _should_ block until its
> been seeded for the first time, just like the new getrandom() syscall,
> and this important bug fix (which is not a real api break) should be
> backported to all stable kernels. (Userspace doesn't even have a
> reliable way of determining whether or not it's been seeded!) Yes,
> yes, you have arguments for why you're keeping this pathological, but
> you're still wrong, and this api is still a bug. Forcing userspace
> architectures to work around your bug, as your arguments usually go,
> is disquieting.

I tried making /dev/urandom block.  The zero day kernel testing within
a few hours told us that Ubuntu LTS at the time and OpenWrt would fail
to boot.  And when Python made a change to call getrandom(2) with the
default blocking semantic instead of using /dev/urandom, some
distributions using systemd stopped booting.

So if you're a security focused individual who is kvetching that we're
asking to force userspace to change to fix "a bug", you need to
understand that making the change you're suggesting will *also*
require making changes to userspace.  And if you want to go try to
convince Linus Torvalds that it's OK to make a backwards-incompatible
changes that break Ubuntu LTS and OpenWRT by causing them to fail to
boot --- be my guest.

And if we're breaking distributions from them booting when their use
of /dev/urandom is not security critical, I suspect Linus is not going
to be impressed that you are breaking those users for no beneft.  (For
example, Python's use of getrandom(2) was to prevent denial of service
attacks when Python is used as a fast-CGI web server script, and it
was utterly pointless from security perspective to block when the
python script was creating on-demand systemd unit files, where the DOS
threat was completely irrelevant.)

> As the printk implies, it's possible for get_random_bytes() to be
> called before it's seeded. Why was that helpful printk put behind an
> ifdef? I suspect because people would see the lines in their dmesg and
> write emails like this one to the list.

The #ifdef and the printk was there from the very beginning.

commit 392a546dc8368d1745f9891ef3f8f7c380de8650
Author: Theodore Ts'o 
Date:   Sun Nov 3 18:24:08 2013 -0500

random: add debugging code to detect early use of get_random_bytes()

Signed-off-by: "Theodore Ts'o" 

It was four years ago, so I don't remember the exact circumstances,
but if I recall the issue was not wanting to spam the dmesg.  Also, at
the time, we hadn't yet created the asynchronous interfaces that allow
kernel code to do the right thing, even if it was massively
inconvenient.

At this point, I think we should be completely open to making it be a
config option, and if it looks like for common configs we're not
spamming dmesg, we could even it make it be the default.  We just also
need to give driver writers some easy-to-understand receipes to fix
their drivers to do the right thing.  If we do that first, it's much
more likely they will actually fix their kernel code, instead of just
turning the warning off.

> drivers/net/ieee802154/at86rf230.c:
> static int at86rf230_probe(struct spi_device *spi)
> {
> ...
> get_random_bytes(csma_seed, ARRAY_SIZE(csma_seed));
> 
> No clue what this driver is for or if it actually needs good
> randomness, but using get_random_bytes in a hardware probe function
> can happen prior to userspace's ability to seed.

What, you mean as a computer scientist you didn't immediately
understand that csma doesn't refer to CSMA/CD --- or "Carrier-sense
multiple access with collision avoidance"?  For shame!  :-)

This is basically the exponential backoff which used in ethernet
networks, and the *real* answer is that they should be using
prandom_u32().

> arch/s390/crypto/prng.c seems to call get_random_bytes on module load,
> which can happen pre-userspace, for seeding some kind of RNG of its
> own.

It appears to be accessing a hardware cryptographic processor
function, and seems to do its own entropy gathering relying on stack
garbage as well as using get_random_bytes().  By default it is defined
as a module, so I suspect in most situations it's called
post-userspace, but agreed that it is not guaranteed to be the case.
I think in practice most IBM customers will be safe because they tend
to use distro kernels that will almost certainly be building it as a
module, but your point is well taken.

> And so on and so on and so on. If you grep the source as I did, you'll
> find there's no shortage of head-scratchers. "Hmmm," you ask yourself,
> "could this be called before userspace has ensured that /dev/urandom
> is 

Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Jason A. Donenfeld
Further investigations: if the whack-a-mole approach is desirable,
perhaps many of those get_random_bytes calls should be converted to
get_blocking_random_bytes. In that case, this commit, which removed
this helpful API, should be reverted:

commit c2719503f5e1e6213d716bb078bdad01e28ebcbf
Author: Herbert Xu 
Date:   Tue Jun 9 18:19:42 2015 +0800

   random: Remove kernel blocking API

   This patch removes the kernel blocking API as it has been completely
   replaced by the callback API.

   Signed-off-by: Herbert Xu 

diff --git a/drivers/char/random.c b/drivers/char/random.c
index a1576ed1d88e..d0da5d852d41 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1265,18 +1265,6 @@ void get_random_bytes(void *buf, int nbytes)
EXPORT_SYMBOL(get_random_bytes);

/*
- * Equivalent function to get_random_bytes with the difference that this
- * function blocks the request until the nonblocking_pool is initialized.
- */
-void get_blocking_random_bytes(void *buf, int nbytes)
-{
-   if (unlikely(nonblocking_pool.initialized == 0))
-   wait_event(urandom_init_wait, nonblocking_pool.initialized);
-   extract_entropy(_pool, buf, nbytes, 0, 0);
-}
-EXPORT_SYMBOL(get_blocking_random_bytes);
-
-/*


Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Jason A. Donenfeld
(Meanwhile...)

In my own code, I'm currently playing with a workaround that looks like this:

--- a/src/main.c
+++ b/src/main.c

+#include 
+#include 

+struct rng_initializer {
+   struct completion done;
+   struct random_ready_callback cb;
+};
+static void rng_initialized_callback(struct random_ready_callback *cb)
+{
+   complete(_of(cb, struct rng_initializer, cb)->done);
+}
+
static int __init mod_init(void)
{
   int ret;
+   struct rng_initializer rng = {
+   .done = COMPLETION_INITIALIZER(rng.done),
+   .cb = { .owner = THIS_MODULE, .func = rng_initialized_callback }
+   };
+
+   ret = add_random_ready_callback();
+   if (!ret)
+   wait_for_completion();
+   else if (ret != -EALREADY)
+   return ret;

   do_things_with_get_random_bytes_maybe();

Depending on the situation, however, I could imagine that
wait_for_completion never returning, if its blocking activity that
contributes to the seed actually being available, if this is called
from a compiled-in module, so I find this a bit sub-optimal...


crypto: Work around deallocated stack frame reference gcc bug on sparc.

2017-06-02 Thread David Miller

On sparc, if we have an alloca() like situation, as is the case with
SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack
memory.  The result can be that the value is clobbered if a trap
or interrupt arrives at just the right instruction.

It only occurs if the function ends returning a value from that
alloca() area and that value can be placed into the return value
register using a single instruction.

For example, in lib/libcrc32c.c:crc32c() we end up with a return
sequence like:

return  %i7+8
 lduw   [%o5+16], %o0   ! MEM[(u32 *)__shash_desc.1_10 + 16B],

%o5 holds the base of the on-stack area allocated for the shash
descriptor.  But the return released the stack frame and the
register window.

So if an intererupt arrives between 'return' and 'lduw', then
the value read at %o5+16 can be corrupted.

Add a data compiler barrier to work around this problem.  This is
exactly what the gcc fix will end up doing as well, and it absolutely
should not change the code generated for other cpus (unless gcc
on them has the same bug :-)

With crucial insight from Eric Sandeen.

Reported-by: Anatoly Pugachev 
Signed-off-by: David S. Miller 
---

See the thread anchored at:

http://marc.info/?l=linux-sparc=149623182616944=2

for discussion, it has a reproducer module.  The problem was
first noticed as occaisional XFS checksum corruptions.

Herbert, I don't expect you to like this but it is the best we can do
I think.  It should not pessimize code on other architectures at all.
I will work on fixing the gcc bug but it's been around forever and all
versions are effected.

I noticed while working on this that at least btrfs duplicates the
facilities provided by lib/libcrc32c.c and therefore should probably
be converted over to straight crc32c() calls if possible.

Thanks!

diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h
index ecdba2f..1ac5b85 100644
--- a/drivers/infiniband/sw/rxe/rxe.h
+++ b/drivers/infiniband/sw/rxe/rxe.h
@@ -68,6 +68,7 @@
 static inline u32 rxe_crc32(struct rxe_dev *rxe,
u32 crc, void *next, size_t len)
 {
+   u32 retval;
int err;
 
SHASH_DESC_ON_STACK(shash, rxe->tfm);
@@ -81,7 +82,9 @@ static inline u32 rxe_crc32(struct rxe_dev *rxe,
return crc32_le(crc, next, len);
}
 
-   return *(u32 *)shash_desc_ctx(shash);
+   retval = *(u32 *)shash_desc_ctx(shash);
+   barrier_data(shash_desc_ctx(shash));
+   return retval;
 }
 
 int rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu);
diff --git a/fs/btrfs/hash.c b/fs/btrfs/hash.c
index a97fdc1..baacc18 100644
--- a/fs/btrfs/hash.c
+++ b/fs/btrfs/hash.c
@@ -38,6 +38,7 @@ u32 btrfs_crc32c(u32 crc, const void *address, unsigned int 
length)
 {
SHASH_DESC_ON_STACK(shash, tfm);
u32 *ctx = (u32 *)shash_desc_ctx(shash);
+   u32 retval;
int err;
 
shash->tfm = tfm;
@@ -47,5 +48,7 @@ u32 btrfs_crc32c(u32 crc, const void *address, unsigned int 
length)
err = crypto_shash_update(shash, address, length);
BUG_ON(err);
 
-   return *ctx;
+   retval = *ctx;
+   barrier_data(ctx);
+   return retval;
 }
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2185c7a..fd2e651 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1078,6 +1078,7 @@ static inline u32 f2fs_crc32(struct f2fs_sb_info *sbi, 
const void *address,
 {
SHASH_DESC_ON_STACK(shash, sbi->s_chksum_driver);
u32 *ctx = (u32 *)shash_desc_ctx(shash);
+   u32 retval;
int err;
 
shash->tfm = sbi->s_chksum_driver;
@@ -1087,7 +1088,9 @@ static inline u32 f2fs_crc32(struct f2fs_sb_info *sbi, 
const void *address,
err = crypto_shash_update(shash, address, length);
BUG_ON(err);
 
-   return *ctx;
+   retval = *ctx;
+   barrier_data(ctx);
+   return retval;
 }
 
 static inline bool f2fs_crc_valid(struct f2fs_sb_info *sbi, __u32 blk_crc,
diff --git a/lib/libcrc32c.c b/lib/libcrc32c.c
index 74a54b7..9f79547 100644
--- a/lib/libcrc32c.c
+++ b/lib/libcrc32c.c
@@ -43,7 +43,7 @@ static struct crypto_shash *tfm;
 u32 crc32c(u32 crc, const void *address, unsigned int length)
 {
SHASH_DESC_ON_STACK(shash, tfm);
-   u32 *ctx = (u32 *)shash_desc_ctx(shash);
+   u32 ret, *ctx = (u32 *)shash_desc_ctx(shash);
int err;
 
shash->tfm = tfm;
@@ -53,7 +53,9 @@ u32 crc32c(u32 crc, const void *address, unsigned int length)
err = crypto_shash_update(shash, address, length);
BUG_ON(err);
 
-   return *ctx;
+   ret = *ctx;
+   barrier_data(ctx);
+   return ret;
 }
 
 EXPORT_SYMBOL(crc32c);


get_random_bytes returns bad randomness before seeding is complete

2017-06-02 Thread Jason A. Donenfeld
Hi folks,

This email is about an issue with get_random_bytes(), the CSPRNG used
inside the kernel for generating keys and nonces and whatnot. However,
I will begin with an aside:

/dev/urandom will return bad randomness before its seeded, rather than
blocking, and despite years and years of discussion and bike shedding,
this behavior hasn't changed, and the vast majority of cryptographers
and security experts remain displeased. It _should_ block until its
been seeded for the first time, just like the new getrandom() syscall,
and this important bug fix (which is not a real api break) should be
backported to all stable kernels. (Userspace doesn't even have a
reliable way of determining whether or not it's been seeded!) Yes,
yes, you have arguments for why you're keeping this pathological, but
you're still wrong, and this api is still a bug. Forcing userspace
architectures to work around your bug, as your arguments usually go,
is disquieting. Anyway, that's not what this email is about, but given
that as a backdrop, here's a different-but-related, and perhaps more
petulant, issue...

The problem this email intends to address is this:

void get_random_bytes(void *buf, int nbytes)
{
#if DEBUG_RANDOM_BOOT > 0
   if (!crng_ready())
   printk(KERN_NOTICE "random: %pF get_random_bytes called "
  "with crng_init = %d\n", (void *) _RET_IP_, crng_init);
#endif
   ...
   extract_crng(buf);

Or, on older kernels:

void get_random_bytes(void *buf, int nbytes)
{
#if DEBUG_RANDOM_BOOT > 0
   if (unlikely(nonblocking_pool.initialized == 0))
   printk(KERN_NOTICE "random: %pF get_random_bytes called "
  "with %d bits of entropy available\n",
  (void *) _RET_IP_,
  nonblocking_pool.entropy_total);
#endif
   trace_get_random_bytes(nbytes, _RET_IP_);
   extract_entropy(_pool, buf, nbytes, 0, 0);
}

As the printk implies, it's possible for get_random_bytes() to be
called before it's seeded. Why was that helpful printk put behind an
ifdef? I suspect because people would see the lines in their dmesg and
write emails like this one to the list.

I anticipate an argument coming from the never-block /dev/urandom
cartel that goes something along the lines of, "get_random_bytes() is
only called in paths initiated by a syscall; therefore, userspace must
ensure /dev/urandom, which is the same pool as get_random_bytes, has
been properly seeded, before making any syscalls that might lead to
get_random_bytes() being called."

If you've already given up on the general initiative of trying to urge
them to make /dev/urandom block until seeded, then this might seem
like a reasonable argument. If /dev/urandom is broken, it doesn't
matter if get_random_bytes() is broken too, if the required
work-around for one is the same as for the other.

But the premise is flawed. get_random_bytes() can be called before any
syscalls are made, before userspace even has an opportunity to ensure
/dev/urandom is seeded. That's what that printk in the ifdef is all
about. Bad news bears.

Grepping through the source tree for get_random_bytes, I just opened a
few files at random to see what the deal was. Here's one:

drivers/net/ieee802154/at86rf230.c:
static int at86rf230_probe(struct spi_device *spi)
{
...
get_random_bytes(csma_seed, ARRAY_SIZE(csma_seed));

No clue what this driver is for or if it actually needs good
randomness, but using get_random_bytes in a hardware probe function
can happen prior to userspace's ability to seed.

arch/s390/crypto/prng.c seems to call get_random_bytes on module load,
which can happen pre-userspace, for seeding some kind of RNG of its
own.

net/ipv6/addrconf.c:
static void __ipv6_regen_rndid(struct inet6_dev *idev)
{
regen:
   get_random_bytes(idev->rndid, sizeof(idev->rndid));

This is in the networking stack when bringing up an interface and
assigning randomly assigned v6 addresses. While you might argue that
userspace is required for networking, remember that the kernel
actually has the ability to initiate networking all on its own, before
userspace; the kernel even has its own dhcp client! Yowza. In fact, on
that note:

net/ipv4/ipconfig.c:
static int __init ic_open_devs(void)
{
...
get_random_bytes(>xid, sizeof(__be32));

And so on and so on and so on. If you grep the source as I did, you'll
find there's no shortage of head-scratchers. "Hmmm," you ask yourself,
"could this be called before userspace has ensured that /dev/urandom
is seeded? do we actually _need_ good randomness here, or does it not
matter?" I guess you could try to reason about each and every one of
them -- you might even have those same questions about the silly
examples I pasted above -- but that one-by-one methodology seems
excessively fragile and arduous.

There must have been a developer who thought about this at least once
before, because random.c does have a callback notifier mechanism for
drivers to learn 

Re: [PATCH v3 2/5] crypto : stm32 - Add STM32F4 CRC32 support

2017-06-02 Thread Cosar Dindar
Hi Fabien,

Thanks for your review.

On Mon, May 29, 2017 at 07:56:48AM +, Fabien DESSENNE wrote:
> Hi Cosar,
> 
> Thank you for the patch
> 
> On 22/05/17 16:34, Cosar Dindar wrote:
> > This patch adds CRC (CRC32 Crypto) support for STM32F4 series.
> >
> > As an hardware limitation polynomial and key setting are not supported.
> > They are fixed as 0x4C11DB7 (poly) and 0x (key).
> > CRC32C Castagnoli algorithm is not used.
> >
> > Signed-off-by: Cosar Dindar 
> > ---
> >   drivers/crypto/stm32/stm32_crc32.c | 68 
> > --
> >   1 file changed, 58 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/crypto/stm32/stm32_crc32.c 
> > b/drivers/crypto/stm32/stm32_crc32.c
> > index ec83b1e..12fbd98 100644
> > --- a/drivers/crypto/stm32/stm32_crc32.c
> > +++ b/drivers/crypto/stm32/stm32_crc32.c
> > @@ -7,6 +7,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   
> >   #include 
> > @@ -39,6 +40,9 @@ struct stm32_crc {
> > struct clk   *clk;
> > u8   pending_data[sizeof(u32)];
> > size_t   nb_pending_bytes;
> > +   bool key_support;
> > +   bool poly_support;
> > +   bool reverse_support;
> >   };
> >   
> >   struct stm32_crc_list {
> > @@ -106,13 +110,31 @@ static int stm32_crc_init(struct shash_desc *desc)
> > }
> > spin_unlock_bh(_list.lock);
> >   
> > -   /* Reset, set key, poly and configure in bit reverse mode */
> > -   writel(bitrev32(mctx->key), ctx->crc->regs + CRC_INIT);
> > -   writel(bitrev32(mctx->poly), ctx->crc->regs + CRC_POL);
> > -   writel(CRC_CR_RESET | CRC_CR_REVERSE, ctx->crc->regs + CRC_CR);
> > +   /* set key */
> > +   if (ctx->crc->key_support) {
> > +   writel(bitrev32(mctx->key), ctx->crc->regs + CRC_INIT);
> > +   } else if (mctx->key != CRC_INIT_DEFAULT) {
> > +   dev_err(ctx->crc->dev, "Unsupported key value! Should be: 
> > 0x%x\n",
> > +   CRC_INIT_DEFAULT);
> > +   return -EINVAL;
> > +   }
> > +
> > +   /* set poly */
> > +   if (ctx->crc->poly_support)
> > +   writel(bitrev32(mctx->poly), ctx->crc->regs + CRC_POL);
> > +
> > +   /* reset and configure in bit reverse mode if supported */
> > +   if (ctx->crc->reverse_support)
> > +   writel(CRC_CR_RESET | CRC_CR_REVERSE, ctx->crc->regs + CRC_CR);
> > +   else
> > +   writel(CRC_CR_RESET, ctx->crc->regs + CRC_CR);
> > +
> > +   /* store partial result */
> > +   if (!ctx->crc->reverse_support)
> > +   ctx->partial = bitrev32(readl(crc->regs + CRC_DR));
> > +   else
> > +   ctx->partial = readl(ctx->crc->regs + CRC_DR);
> >   
> > -   /* Store partial result */
> > -   ctx->partial = readl(ctx->crc->regs + CRC_DR);
> > ctx->crc->nb_pending_bytes = 0;
> >   
> > return 0;
> > @@ -135,7 +157,12 @@ static int stm32_crc_update(struct shash_desc *desc, 
> > const u8 *d8,
> >   
> > if (crc->nb_pending_bytes == sizeof(u32)) {
> > /* Process completed pending data */
> > -   writel(*(u32 *)crc->pending_data, crc->regs + CRC_DR);
> > +   if (!ctx->crc->reverse_support)
> > +   writel(bitrev32(*(u32 *)crc->pending_data),
> > +  crc->regs + CRC_DR);
> > +   else
> > +   writel(*(u32 *)crc->pending_data,
> > +  crc->regs + CRC_DR);
> > crc->nb_pending_bytes = 0;
> > }
> > }
> > @@ -143,10 +170,16 @@ static int stm32_crc_update(struct shash_desc *desc, 
> > const u8 *d8,
> > d32 = (u32 *)d8;
> > for (i = 0; i < length >> 2; i++)
> > /* Process 32 bits data */
> > -   writel(*(d32++), crc->regs + CRC_DR);
> > +   if (!ctx->crc->reverse_support)
> > +   writel(bitrev32(*(d32++)), crc->regs + CRC_DR);
> > +   else
> > +   writel(*(d32++), crc->regs + CRC_DR);
> >   
> > /* Store partial result */
> > -   ctx->partial = readl(crc->regs + CRC_DR);
> > +   if (!ctx->crc->reverse_support)
> > +   ctx->partial = bitrev32(readl(crc->regs + CRC_DR));
> > +   else
> > +   ctx->partial = readl(crc->regs + CRC_DR);
> >   
> > /* Check for pending data (non 32 bits) */
> > length &= 3;
> > @@ -243,6 +276,7 @@ static int stm32_crc_probe(struct platform_device *pdev)
> > struct stm32_crc *crc;
> > struct resource *res;
> > int ret;
> > +   int algs_size;
> >   
> > crc = devm_kzalloc(dev, sizeof(*crc), GFP_KERNEL);
> > if (!crc)
> > @@ -269,13 +303,26 @@ static int stm32_crc_probe(struct platform_device 
> > *pdev)
> > return ret;
> > }
> >   
> > +   /* set key, poly and reverse support if device is of F7 series */
> > +   if (of_device_is_compatible(crc->dev->of_node, "st,stm32f7-crc")) {
> > +   

Re: [PATCH v3 13/13] ARM: dts: sunxi: add SoC specific compatibles for the crypto nodes

2017-06-02 Thread Maxime Ripard
On Thu, Jun 01, 2017 at 09:39:05PM +0200, Antoine Tenart wrote:
> Add SoC specific compatibles for all sunXi crypto nodes, in addition to
> the one already used (allwinner,sun4i-a10-crypto).
> 
> Signed-off-by: Antoine Tenart 

Applied, thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH v3 12/13] ARM: sun5i: add a cryptographic engine node

2017-06-02 Thread Maxime Ripard
On Thu, Jun 01, 2017 at 09:39:04PM +0200, Antoine Tenart wrote:
> Add a node for the cryptographic engine that can be found on sun5i SoCs.
> This cryptographic engine is compatible with the Allwinner cryptographic
> accelerator driver.
> 
> Signed-off-by: Antoine Tenart 

Applied, thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


[RFC PATCH 1/2] crypto: caam - properly set IV after {en,de}crypt

2017-06-02 Thread David Gstir
Certain cipher modes like CTS expect the IV (req->info) of
ablkcipher_request (or equivalently req->iv of skcipher_request) to
contain the last ciphertext block when the {en,de}crypt operation is done.
This is currently not the case for the CAAM driver which in turn breaks
e.g. cts(cbc(aes)) when the CAAM driver is enabled.

This patch fixes the CAAM driver to properly set the IV after the
{en,de}crypt operation of ablkcipher finishes.

Signed-off-by: David Gstir 
---
 drivers/crypto/caam/caamalg.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 398807d1b77e..d13c1aee4427 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -882,10 +882,11 @@ static void ablkcipher_encrypt_done(struct device *jrdev, 
u32 *desc, u32 err,
 {
struct ablkcipher_request *req = context;
struct ablkcipher_edesc *edesc;
-#ifdef DEBUG
struct crypto_ablkcipher *ablkcipher = crypto_ablkcipher_reqtfm(req);
int ivsize = crypto_ablkcipher_ivsize(ablkcipher);
+   int nents;
 
+#ifdef DEBUG
dev_err(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
 #endif
 
@@ -904,6 +905,19 @@ static void ablkcipher_encrypt_done(struct device *jrdev, 
u32 *desc, u32 err,
 #endif
 
ablkcipher_unmap(jrdev, edesc, req);
+
+   if (req->src == req->dst)
+   nents = edesc->src_nents;
+   else
+   nents = edesc->dst_nents;
+
+   /*
+* The crypto API expects us to set the IV (req->info) to the last
+* ciphertext block. This is used e.g. by the CTS mode.
+*/
+   sg_pcopy_to_buffer(req->dst, nents, req->info, ivsize,
+  req->nbytes - ivsize);
+
kfree(edesc);
 
ablkcipher_request_complete(req, err);
@@ -914,10 +928,10 @@ static void ablkcipher_decrypt_done(struct device *jrdev, 
u32 *desc, u32 err,
 {
struct ablkcipher_request *req = context;
struct ablkcipher_edesc *edesc;
-#ifdef DEBUG
struct crypto_ablkcipher *ablkcipher = crypto_ablkcipher_reqtfm(req);
int ivsize = crypto_ablkcipher_ivsize(ablkcipher);
 
+#ifdef DEBUG
dev_err(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
 #endif
 
@@ -935,6 +949,14 @@ static void ablkcipher_decrypt_done(struct device *jrdev, 
u32 *desc, u32 err,
 #endif
 
ablkcipher_unmap(jrdev, edesc, req);
+
+   /*
+* The crypto API expects us to set the IV (req->info) to the last
+* ciphertext block.
+*/
+   sg_pcopy_to_buffer(req->src, edesc->src_nents, req->info, ivsize,
+  req->nbytes - ivsize);
+
kfree(edesc);
 
ablkcipher_request_complete(req, err);
-- 
2.12.0



[RFC PATCH 2/2] crypto: caam - fix k*alloc if called from own cipher callback

2017-06-02 Thread David Gstir
There are cases (e.g. the cts mode) where a cipher can be called again
from its own callback. In our case this callback is executed from within
a tasklet in the jobring, we must not sleep when allocating memory.

This patch detects such cases by using in_interrupt() to properly set the
k*alloc flags. In most cases we will still use GFP_KERNEL if the flags
CRYPTO_TFM_REQ_MAY_SLEEP or CRYPTO_TFM_REQ_MAY_BACKLOG are set for the
cipher request.

Signed-off-by: David Gstir 
---
 drivers/crypto/caam/caamalg.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index d13c1aee4427..4c4a5d1ad875 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -1209,13 +1209,18 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct caam_ctx *ctx = crypto_aead_ctx(aead);
struct device *jrdev = ctx->jrdev;
-   gfp_t flags = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
-  CRYPTO_TFM_REQ_MAY_SLEEP)) ? GFP_KERNEL : GFP_ATOMIC;
+   gfp_t flags;
int src_nents, mapped_src_nents, dst_nents = 0, mapped_dst_nents = 0;
struct aead_edesc *edesc;
int sec4_sg_index, sec4_sg_len, sec4_sg_bytes;
unsigned int authsize = ctx->authsize;
 
+   if (!in_interrupt() && req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
+ CRYPTO_TFM_REQ_MAY_SLEEP))
+   flags = GFP_KERNEL;
+   else
+   flags = GFP_ATOMIC;
+
if (unlikely(req->dst != req->src)) {
src_nents = sg_nents_for_len(req->src, req->assoclen +
 req->cryptlen);
@@ -1497,9 +1502,7 @@ static struct ablkcipher_edesc 
*ablkcipher_edesc_alloc(struct ablkcipher_request
struct crypto_ablkcipher *ablkcipher = crypto_ablkcipher_reqtfm(req);
struct caam_ctx *ctx = crypto_ablkcipher_ctx(ablkcipher);
struct device *jrdev = ctx->jrdev;
-   gfp_t flags = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
- CRYPTO_TFM_REQ_MAY_SLEEP)) ?
-  GFP_KERNEL : GFP_ATOMIC;
+   gfp_t flags;
int src_nents, mapped_src_nents, dst_nents = 0, mapped_dst_nents = 0;
struct ablkcipher_edesc *edesc;
dma_addr_t iv_dma = 0;
@@ -1507,6 +1510,12 @@ static struct ablkcipher_edesc 
*ablkcipher_edesc_alloc(struct ablkcipher_request
int ivsize = crypto_ablkcipher_ivsize(ablkcipher);
int dst_sg_idx, sec4_sg_ents, sec4_sg_bytes;
 
+   if (!in_interrupt() && req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
+ CRYPTO_TFM_REQ_MAY_SLEEP))
+   flags = GFP_KERNEL;
+   else
+   flags = GFP_ATOMIC;
+
src_nents = sg_nents_for_len(req->src, req->nbytes);
if (unlikely(src_nents < 0)) {
dev_err(jrdev, "Insufficient bytes (%d) in src S/G\n",
@@ -1703,9 +1712,7 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
struct crypto_ablkcipher *ablkcipher = crypto_ablkcipher_reqtfm(req);
struct caam_ctx *ctx = crypto_ablkcipher_ctx(ablkcipher);
struct device *jrdev = ctx->jrdev;
-   gfp_t flags = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
- CRYPTO_TFM_REQ_MAY_SLEEP)) ?
-  GFP_KERNEL : GFP_ATOMIC;
+   gfp_t flags;
int src_nents, mapped_src_nents, dst_nents, mapped_dst_nents;
struct ablkcipher_edesc *edesc;
dma_addr_t iv_dma = 0;
@@ -1713,6 +1720,12 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
int ivsize = crypto_ablkcipher_ivsize(ablkcipher);
int dst_sg_idx, sec4_sg_ents, sec4_sg_bytes;
 
+   if (!in_interrupt() && req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
+ CRYPTO_TFM_REQ_MAY_SLEEP))
+   flags = GFP_KERNEL;
+   else
+   flags = GFP_ATOMIC;
+
src_nents = sg_nents_for_len(req->src, req->nbytes);
if (unlikely(src_nents < 0)) {
dev_err(jrdev, "Insufficient bytes (%d) in src S/G\n",
-- 
2.12.0



[RFC PATCH 0/2] crypto: caam - fix cts(cbc(aes)) with CAAM driver

2017-06-02 Thread David Gstir
Hi!

While testing fscrypt's filename encryption, I noticed that the implementation
of cts(cbc(aes)) is broken when the CAAM hardware crypto driver is enabled.
Some digging showed that the refactoring of crypto/cts.c in v4.8 
(commit 0605c41cc53ca) exposed some problems with CAAM's aes-cbc
implementation. This can be tested quite easily by loading the tcrypt
module with mode=38. Looks like the cts mode is not used very often
and this went unnoticed since 4.8...

This patch series is an attempt to fix that and get cts(cbc(aes)) working
properly again when the CAAM driver is enabled.

Specifically, the issues are:

1) The cts implementation assumes ->info of ablkcipher_request (or ->iv of
   skcipher_request respectively) to be set to the last ciphertext block when
   the aes-cbc encryption is finished. Meaning that ->info is changed by the
   aes-cbc code. The CAAM driver does not do that and leaves ->info as-is.

   It is not fully clear to me yet if this is a requirement of the crypto API,
   or if this is just a side effect that is exploited by the cts implementation?

   For now, I assumed it is a requirement of the crypto API since I've seen
   other crypto drivers (e.g. AMD's CCP) do that. So the patch fixes the CAAM
   crypto driver accordingly.

   Also, the aead code in the CAAM driver, more specifically the gcm mode code
   seems to have the same flaw, so it'll need a similar fix in case.


2) The cts implementation uses aes-cbc twice to perform its job. The second
   time, it is called from within the callback of the first call to aes-cbc.
   This usage is not properly handled in the CAAM driver which triggers the
   BUG below.

   My current proposal is to use in_interrupt() to detect such cases and set
   the k*alloc flags accordingly. However, since using in_interrupt() is not
   really nice, I'm wondering if there is a better way to handle this?


Thanks,
David


[  126.252543] BUG: sleeping function called from invalid context at 
mm/slab.h:432
[  126.260099] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[  126.266837] no locks held by swapper/0/0.
[  126.270969] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.12.0-rc3-00052-g0ddec680d395 #287
[  126.279226] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  126.285821] Backtrace:
[  126.288406] [] (dump_backtrace) from [] 
(show_stack+0x18/0x1c)
[  126.296057]  r7: r6:6113 r5: r4:c102ab48
[  126.301798] [] (show_stack) from [] 
(dump_stack+0xb4/0xe8)
[  126.309106] [] (dump_stack) from [] 
(___might_sleep+0x17c/0x2a0)
[  126.316929]  r9: r8:c0a016dc r7:c101ee3e r6:01b0 r5:c0c12fac 
r4:e000
[  126.324751] [] (___might_sleep) from [] 
(__might_sleep+0x64/0xa4)
[  126.332655]  r7:014080c1 r6: r5:01b0 r4:c0c12fac
[  126.338397] [] (__might_sleep) from [] 
(__kmalloc+0x130/0x1b8)
[  126.346039]  r6:c071a8ec r5:014080c1 r4:eec01e00
[  126.350742] [] (__kmalloc) from [] 
(ablkcipher_edesc_alloc.constprop.1+0x200/0x900)
[  126.360213]  r10: r9: r8:c0a016dc r7:c0a016dc r6:ee05ac10 
r5:edfdaec0
[  126.368109]  r4:0001 r3:0020
[  126.371765] [] (ablkcipher_edesc_alloc.constprop.1) from 
[] (ablkcipher_encrypt+0x24/0x9c)
[  126.381843]  r10: r9:0020 r8:0001 r7:ee05ac10 r6:ed597000 
r5:edfdaec0
[  126.389738]  r4:ed475d08
[  126.392354] [] (ablkcipher_encrypt) from [] 
(skcipher_encrypt_ablkcipher+0x68/0x6c)
[  126.401822]  r7:ed475d08 r6:ed597000 r5:0400 r4:ed475d08
[  126.407566] [] (skcipher_encrypt_ablkcipher) from [] 
(cts_cbc_encrypt+0x118/0x124)
[  126.416947]  r7:ed475d08 r6:c1001cc0 r5:0010 r4:edfdae00
[  126.422686] [] (cts_cbc_encrypt) from [] 
(crypto_cts_encrypt_done+0x20/0x54)
[  126.431548]  r10: r9:ee05ac10 r8: r7:0010 r6:edc8e6c0 
r5:edc8e6d8
[  126.439443]  r4:edfdae00
[  126.442056] [] (crypto_cts_encrypt_done) from [] 
(ablkcipher_encrypt_done+0x88/0x9c)
[  126.445180] fec 2188000.ethernet eth0: MDIO read timeout
[  126.456948]  r5:edc8e6d8 r4:edfdaec0
[  126.460604] [] (ablkcipher_encrypt_done) from [] 
(caam_jr_dequeue+0x214/0x2d4)
[  126.469639]  r9:0001 r8:ee088010 r7:01ff r6:0001 r5: 
r4:edfdaec0
[  126.477467] [] (caam_jr_dequeue) from [] 
(tasklet_action+0x98/0x154)
[  126.485160] fec 2188000.ethernet eth0: MDIO read timeout
[  126.490975]  r10:c1080b80 r9:c1008b84 r8: r7:c100 r6: 
r5:ee088028
[  126.498870]  r4:ee088024
[  126.501484] [] (tasklet_action) from [] 
(__do_softirq+0xf0/0x2a4)
[  126.509390]  r10:4006 r9:c1002080 r8:0101 r7:c1002098 r6:c100 
r5:0006
[  126.517285]  r4:
[  126.519896] [] (__do_softirq) from [] 
(irq_exit+0xec/0x168)
[  126.525127] fec 2188000.ethernet eth0: MDIO write timeout
[  126.532709]  r10:c1008cf0 r9:eec08400 r8:0001 r7: r6:c1008b84 
r5:
[  126.540603]  r4:c0fd940c
[  126.543222] [] (irq_exit) from [] 
(__handle_domain_irq+0x74/0xe8)
[  126.551135] [] (__handle_domain_irq)