Re: [PATCH v2 0/5] crypto: Speck support

2018-04-25 Thread Samuel Neves
On Wed, Apr 25, 2018 at 8:49 PM, Eric Biggers  wrote:
> I agree that my explanation should have been better, and should have 
> considered
> more crypto algorithms.  The main difficulty is that we have extreme 
> performance
> requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
> devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
> performance exceeding that after much optimization, we've been getting a lot 
> of
> pushback as people want closer to 100 MB/s.
>

I couldn't find any NEON-capable ARMv7 chip below 800 MHz, so this
would put the performance upper bound around 15 cycles per byte, with
the comfortable number being ~7. That's indeed tough, though not
impossible.

>
> That's why I also included Speck64-XTS in the patches, since it was
> straightforward to include, and some devices may really need that last 20-30% 
> of
> performance for encryption to be feasible at all.  (And when the choice is
> between unencrypted and a 64-bit block cipher, used in a context where the
> weakest points in the cryptosystem are actually elsewhere such as the user's
> low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
> the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
> that continues to be the case I'd be fine with Speck64 being removed, leaving
> just Speck128.
>

I would very much prefer that to be the case. As many of us know,
"it's better than nothing" has been often used to justify other bad
choices, like RC4, that end up preventing better ones from being
adopted. At a time where we're trying to get rid of 64-bit ciphers in
TLS, where data volumes per session are comparatively low, it would be
unfortunate if the opposite starts happening on encryption at rest.

>
> Note that in practice, to have any chance at meeting the performance 
> requirement
> the cipher needed to be NEON accelerated.  That made benchmarking really hard
> and time-consuming, since to definitely know how an algorithm performs it can
> take upwards of a week to implement a NEON version.  It needs to be very well
> optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
> performance improvement on some CPUs just by changing the NEON instructions 
> used
> to implement the 8-bit rotates, an optimization that is not possible with
> ciphers that don't use rotate amounts that are multiples of 8.  (This was an
> intentional design choice by the Speck designers; they do know what they're
> doing, actually.)
>
> Thus, we had to be pretty aggressive about dropping algorithms from
> consideration if there were preliminary indications that they wouldn't perform
> well, or had too little cryptanalysis, or had other issues such as an unclear
> patent situation.  Threefish for example I did test the C implementation at
> https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
> than my NEON implementation of Speck128/256-XTS.  And I did not see a clear 
> way
> that it could be improved over 4x with NEON, if at all, so I did not take the
> long time it would have taken to write an optimized NEON implementation to
> benchmark it properly.  Perhaps that was a mistake.  But, time is not 
> unlimited.
>

In my limited experience with NEON and 64-bit ARX, there's usually a
~2x speedup solely from NEON's native 64-bit operations on ARMv7-A.
The extra speedup from encrypting 2 block in parallel is then
somewhere between 1x and 2x, depending on various details. Getting
near 4x might be feasible, but it is indeed time-consuming to get
there.

>
> As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
> Crowley to explain it properly, but briefly it's actually a pseudorandom
> permutation over an arbitrarily-sized message.  So with dm-crypt for example, 
> it
> would operate on a whole 512-byte sector, and if any bit of the 512-byte
> plaintext is changed, then every bit in the 512-byte ciphertext would change
> with 50% probability.  To make this possible, the construction uses a 
> polynomial
> evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
> mode.
>

Oh, OK, that sounds like something resembling Naor-Reingold or its
relatives. That would work, but with 3 or 4 passes I guess it wouldn't
be very fast.

>
> Using ChaCha20's underlying 512-bit permutation to build a tweakable block
> cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
> obvious to me how to do so.  Do you have references to any relevant papers?
> Remember that we strongly prefer a published cipher to a custom one -- even if
> the core is reused, a mistake may be made in the way it is used.  Thus,
> similarly to Paul's wide-block mode, I'd be concerned that we'd have to
> self-publish a new construction, then use it with no outside crypto review.
> *Maybe* it would be straightforward enough to be okay, but to know I'd need to
> see the details of how 

Re: [PATCH v2 0/5] crypto: Speck support

2018-04-25 Thread Eric Biggers
Hi Samuel,

On Wed, Apr 25, 2018 at 03:33:16PM +0100, Samuel Neves wrote:
> Let's put the provenance of Speck aside for a moment, and suppose that
> it is an ideal block cipher. There are still some issues with this
> patch as it stands.
> 
>  - The rationale seems off. Consider this bit from the commit message:
> 
> > Other AES alternatives such as Twofish, Threefish, Camellia, CAST6, and 
> > Serpent aren't
> > fast enough either; it seems that only a modern ARX cipher can provide 
> > sufficient performance
> > on these devices.
> 
> One of these things is very much not like the others. Threefish _is_ a
> modern ARX cipher---a tweakable block cipher in fact, precluding the
> need for XEX-style masking. Is it too slow? Does it not have the
> correct block size?
> 
> > We've also considered a novel length-preserving encryption mode based on
> > ChaCha20 and Poly1305.
> 
> I'm very curious about this, namely as to what the role of Poly1305
> would be here. ChaCha20's underlying permutation could, of course, be
> transformed into a 512-bit tweakable block cipher relatively
> painlessly, retaining the performance of regular ChaCha20 with
> marginal additional overhead. This would not be a standard
> construction, but clearly that is not an issue.
> 
> But the biggest problem here, in my mind, is that for all the talk of
> using 128-bit block Speck, this patch tacks on the 64-bit block
> variant of Speck into the kernel, and speck64-xts as well! As far as I
> can tell, this is the _only_ instance of a 64-bit XTS instance in the
> entire codebase. Now, if you wanted a fast 64-bit ARX block cipher,
> the kernel already had XTEA. Instead, this is adding yet another
> 64-bit block cipher into the crypto API, in a disk-encryption mode no
> less, so that it can be misused later. In the disk encryption setting,
> it's particularly concerning to be using such a small block size, as
> data volumes can quickly add up to the birthday bound.
> 
> > It's easy to say that, but do you have an actual suggestion?
> 
> I don't know how seriously you are actually asking this, but some
> 128-bit software-friendly block ciphers could be SPARX, LEA, RC5, or
> RC6. SPARX, in particular, has similarities to Speck but has some
> further AES-like design guarantees that other prior ARX block ciphers
> did not. Some other bitsliced designs, such as Noekeon or SKINNY, may
> also work well with NEON, but I don't know much about their
> performance there.
> 

I agree that my explanation should have been better, and should have considered
more crypto algorithms.  The main difficulty is that we have extreme performance
requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
performance exceeding that after much optimization, we've been getting a lot of
pushback as people want closer to 100 MB/s.

That's why I also included Speck64-XTS in the patches, since it was
straightforward to include, and some devices may really need that last 20-30% of
performance for encryption to be feasible at all.  (And when the choice is
between unencrypted and a 64-bit block cipher, used in a context where the
weakest points in the cryptosystem are actually elsewhere such as the user's
low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
that continues to be the case I'd be fine with Speck64 being removed, leaving
just Speck128.

Note that in practice, to have any chance at meeting the performance requirement
the cipher needed to be NEON accelerated.  That made benchmarking really hard
and time-consuming, since to definitely know how an algorithm performs it can
take upwards of a week to implement a NEON version.  It needs to be very well
optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
performance improvement on some CPUs just by changing the NEON instructions used
to implement the 8-bit rotates, an optimization that is not possible with
ciphers that don't use rotate amounts that are multiples of 8.  (This was an
intentional design choice by the Speck designers; they do know what they're
doing, actually.)

Thus, we had to be pretty aggressive about dropping algorithms from
consideration if there were preliminary indications that they wouldn't perform
well, or had too little cryptanalysis, or had other issues such as an unclear
patent situation.  Threefish for example I did test the C implementation at
https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
that it could be improved over 4x with NEON, if at all, so I did not take the
long time it would have taken to write an optimized NEON implementation to
benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.

RC5 and RC6 use data-dependent rotates which 

Re: [PATCH v2 1/2] crypto: ccree: enable support for hardware keys

2018-04-25 Thread Tudor Ambarus

Hi, Gilad,

On 04/23/2018 10:25 AM, Gilad Ben-Yossef wrote:

Enable CryptoCell support for hardware keys.

Hardware keys are regular AES keys loaded into CryptoCell internal memory
via firmware, often from secure boot ROM or hardware fuses at boot time.

As such, they can be used for enc/dec purposes like any other key but
cannot (read: extremely hard to) be extracted since since they are not
available anywhere in RAM during runtime.

The mechanism has some similarities to s390 secure keys although the keys
are not wrapped or sealed, but simply loaded offline. The interface was
therefore modeled based on the s390 secure keys support.


I'm interested in hardware keys, ecc508 supports them too. In your
proposal you expect that the user will provide a specific key token that
is meaningful only for the ccree driver. If another driver that supports
"cbc(paes)" shows up, you will force the user to select a specific
driver implementation and to know what kind of key token to provide.
Shouldn't we have a common API that can address other drivers too?

Best,
ta


RE: [dm-devel] [PATCH 2/2] md: dm-verity: allow parallel processing of bio blocks

2018-04-25 Thread yael.chemla


> -Original Message-
> From: Eric Biggers 
> Sent: Tuesday, 27 March 2018 9:55
> To: Yael Chemla 
> Cc: Alasdair Kergon ; Mike Snitzer ;
> dm-de...@redhat.com; linux-ker...@vger.kernel.org; ofir.dr...@gmail.com;
> Yael Chemla ; linux-crypto@vger.kernel.org;
> gi...@benyossef.com
> Subject: Re: [dm-devel] [PATCH 2/2] md: dm-verity: allow parallel processing
> of bio blocks
> 
> [+Cc linux-crypto]
> 
> Hi Yael,
> 
> On Sun, Mar 25, 2018 at 07:41:30PM +0100, Yael Chemla wrote:
> >  Allow parallel processing of bio blocks by moving to async.
> > completion  handling. This allows for better resource utilization of
> > both HW and  software based hash tfm and therefore better performance
> > in many cases,  depending on the specific tfm in use.
> >
> >  Tested on ARM32 (zynq board) and ARM64 (Juno board).
> >  Time of cat command was measured on a filesystem with various file sizes.
> >  12% performance improvement when HW based hash was used (ccree
> driver).
> >  SW based hash showed less than 1% improvement.
> >  CPU utilization when HW based hash was used presented 10% less
> > context  switch, 4% less cycles and 7% less instructions. No
> > difference in  CPU utilization noticed with SW based hash.
> >
> > Signed-off-by: Yael Chemla 
> 
> Okay, I definitely would like to see dm-verity better support hardware crypto
> accelerators, but these patches were painful to read.
> 
> There are lots of smaller bugs, but the high-level problem which you need to
> address first is that on every bio you are always allocating all the extra
> memory to hold a hash request and scatterlist for every data block.  This will

I have a question regarding scatterlist memory:
I noticed that all blocks in dmverity end up using two buffers: one for data 
and other for salt. 
I'm using function similar to verity_for_io_block to iterate and find the 
number of buffers,
in my case data_dev_block_bits =12, todo=4096, thus the do while will iterate 
only once.
I assume that since it's there there are cases it'll iterate more.
I'm trying to figure out which cases will require more than one buffer of data 
per block? 
In dm_crypt there is limitation of static 4 scatterlist elements per in/out 
(see struct dm_crypt_request).
Is there an upper bound regarding number of buffers per block in dm-verity?  
I need this for the implementation of  mempool per scatterlist buffers.
Thanks ,
Yael

> not only hurt performance when the hashing is done in software (I'm
> skeptical that your performance numbers are representative of that case), but
> it will also fall apart under memory pressure.  We are trying to get low-end
> Android devices to start using dm-verity, and such devices often have only 1
> GB or even only 512 MB of RAM, so memory allocations are at increased risk
> of failing.  In fact I'm pretty sure you didn't do any proper stress testing 
> of
> these patches, since the first thing they do for every bio is try to allocate 
> a
> physically contiguous array that is nearly as long as the full bio data itself
> (n_blocks * sizeof(struct dm_verity_req_data) = n_blocks * 3264, at least on a
> 64-bit platform, mostly due to the 'struct dm_verity_fec_io'), so potentially
> up to about 1 MB; that's going to fail a lot even on systems with gigabytes of
> RAM...
> 
> (You also need to verify that your new code is compatible with the forward
> error correction feature, with the "ignore_zero_blocks" option, and with the
> new "check_at_most_once" option.  From my reading of the code, all of
> those seemed broken; the dm_verity_fec_io structures, for example, weren't
> even being
> initialized...)
> 
> I think you need to take a close look at how dm-crypt handles async crypto
> implementations, since it seems to do it properly without hurting the
> common case where the crypto happens synchronously.  What it does, is it
> reserves space in the per-bio data for a single cipher request.  Then, *only* 
> if
> the cipher implementation actually processes the request asynchronously (as
> indicated by -EINPROGRESS being returned) is a new cipher request allocated
> dynamically, using a mempool (not kmalloc, which is prone to fail).  Note that
> unlike your patches it also properly handles the case where the hardware
> crypto queue is full, as indicated by the cipher implementation returning -
> EBUSY; in that case, dm-crypt waits to start another request until there is
> space in the queue.
> 
> I think it would be possible to adapt dm-crypt's solution to dm-verity.
> 
> Thanks,
> 
> Eric
> 
> > ---
> >  drivers/md/dm-verity-fec.c|  10 +-
> >  drivers/md/dm-verity-fec.h|   7 +-
> >  drivers/md/dm-verity-target.c | 215 +++--
> -
> >  drivers/md/dm-verity.h|   4 +-
> >  4 files changed, 173 insertions(+), 63 deletions(-)
> >
> > diff --git 

[PATCH 7/7] chtls: handling HW supported sockopt

2018-04-25 Thread Atul Gupta
some of the supported socket options are sent to HW while rest
are handled by SW

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls.h  |  10 ++
 drivers/crypto/chelsio/chtls/chtls_cm.h   |  12 ++
 drivers/crypto/chelsio/chtls/chtls_hw.c   |   2 +-
 drivers/crypto/chelsio/chtls/chtls_main.c | 191 +-
 4 files changed, 211 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/chelsio/chtls/chtls.h 
b/drivers/crypto/chelsio/chtls/chtls.h
index a53a0e6..3e46d28 100644
--- a/drivers/crypto/chelsio/chtls/chtls.h
+++ b/drivers/crypto/chelsio/chtls/chtls.h
@@ -353,6 +353,15 @@ enum {
 #define TCP_PAGE(sk)   (sk->sk_frag.page)
 #define TCP_OFF(sk)(sk->sk_frag.offset)
 
+struct tcp_cong_ops {
+   struct tcp_congestion_ops   ops;
+   int key;
+};
+
+#define CONG_OPS(__s, __k) \
+   { { .name = __s, .owner = THIS_MODULE }, .key = CONG_ALG_##__k, }
+#define CONG_ALG_NONE (-1)
+
 static inline struct chtls_dev *to_chtls_dev(struct tls_device *tlsdev)
 {
return container_of(tlsdev, struct chtls_dev, tlsdev);
@@ -472,6 +481,7 @@ int send_tx_flowc_wr(struct sock *sk, int compl,
 void chtls_tcp_push(struct sock *sk, int flags);
 int chtls_push_frames(struct chtls_sock *csk, int comp);
 int chtls_set_tcb_tflag(struct sock *sk, unsigned int bit_pos, int val);
+int chtls_set_tcb_field(struct sock *sk, u16 word, u64 mask, u64 val);
 int chtls_setkey(struct chtls_sock *csk, u32 keylen, u32 mode);
 void skb_entail(struct sock *sk, struct sk_buff *skb, int flags);
 unsigned int keyid_to_addr(int start_addr, int keyid);
diff --git a/drivers/crypto/chelsio/chtls/chtls_cm.h 
b/drivers/crypto/chelsio/chtls/chtls_cm.h
index 78eb3af..569b723 100644
--- a/drivers/crypto/chelsio/chtls/chtls_cm.h
+++ b/drivers/crypto/chelsio/chtls/chtls_cm.h
@@ -36,9 +36,21 @@
 #define TF_TLS_ENABLE_S  0
 #define TF_TLS_ENABLE_V(x) ((x) << TF_TLS_ENABLE_S)
 
+#define TF_NAGLE_S  7
+#define TF_NAGLE_V(x) ((x) << TF_NAGLE_S)
+
 #define TF_RX_QUIESCE_S15
 #define TF_RX_QUIESCE_V(x) ((x) << TF_RX_QUIESCE_S)
 
+#define TF_TURBO_S 21
+#define TF_TURBO_V(x) ((x) << TF_TURBO_S)
+
+#define TF_CCTRL_SEL0_S22
+#define TF_CCTRL_SEL0_V(x) ((x) << TF_CCTRL_SEL0_S)
+
+#define TCB_TOS_S  10
+#define TCB_TOS_V(x)   ((x) << TCB_TOS_S)
+
 /*
  * Max receive window supported by HW in bytes.  Only a small part of it can
  * be set through option0, the rest needs to be set through RX_DATA_ACK.
diff --git a/drivers/crypto/chelsio/chtls/chtls_hw.c 
b/drivers/crypto/chelsio/chtls/chtls_hw.c
index 55d5014..1b7ee6b 100644
--- a/drivers/crypto/chelsio/chtls/chtls_hw.c
+++ b/drivers/crypto/chelsio/chtls/chtls_hw.c
@@ -61,7 +61,7 @@ static void __set_tcb_field(struct sock *sk, struct sk_buff 
*skb, u16 word,
  * Send control message to HW, message go as immediate data and packet
  * is freed immediately.
  */
-static int chtls_set_tcb_field(struct sock *sk, u16 word, u64 mask, u64 val)
+int chtls_set_tcb_field(struct sock *sk, u16 word, u64 mask, u64 val)
 {
struct cpl_set_tcb_field *req;
unsigned int credits_needed;
diff --git a/drivers/crypto/chelsio/chtls/chtls_main.c 
b/drivers/crypto/chelsio/chtls/chtls_main.c
index 4dc3d0e..9c3255d 100644
--- a/drivers/crypto/chelsio/chtls/chtls_main.c
+++ b/drivers/crypto/chelsio/chtls/chtls_main.c
@@ -512,15 +512,200 @@ static int do_chtls_setsockopt(struct sock *sk, int 
optname,
return rc;
 }
 
-static int chtls_setsockopt(struct sock *sk, int level, int optname,
+void chtls_set_tos(struct sock *sk)
+{
+   u64 mask, val;
+
+   mask = 0x3FULL;
+   val = (inet_sk(sk)->tos >> 2) & 0x3F;
+   chtls_set_tcb_field(sk, 3, TCB_TOS_V(mask), TCB_TOS_V(val));
+}
+
+#define UNSUP_IP_SOCK_OPT ((1 << IP_OPTIONS))
+
+/*
+ *  Socket option code for IP.
+ */
+static int do_ip_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, unsigned int optlen)
 {
+   if (level != SOL_IP)
+   return -ENOPROTOOPT;
+
+   /* unsupported options */
+   if ((1 << optname) & UNSUP_IP_SOCK_OPT)
+   return -ENOPROTOOPT;
+
+   /* specially handled options */
+   if (optname == IP_TOS) {
+   struct inet_sock *inet = inet_sk(sk);
+   int val = 0, err = 0;
+
+   if (optlen >= sizeof(int)) {
+   if (get_user(val, (int __user *)optval))
+   return -EFAULT;
+   } else if (optlen >= sizeof(char)) {
+   unsigned char ucval;
+
+   if (get_user(ucval, (unsigned char __user *)optval))
+   return -EFAULT;
+   val = (int)ucval;
+   }
+   lock_sock(sk);
+   val &= ~3;
+   val |= inet->tos & 3;
+   if (IPTOS_PREC(val) >= IPTOS_PREC_CRITIC_ECP &&
+   

[PATCH 6/7] chtls: generic handling for data and header

2018-04-25 Thread Atul Gupta
removed redundant check and made TLS PDU and header recv
handling common as received from HW

Signed-off-by: Atul Gupta 
Signed-off-by: Harsh Jain 
---
 drivers/crypto/chelsio/chtls/chtls.h| 10 ++
 drivers/crypto/chelsio/chtls/chtls_cm.c | 12 +---
 drivers/crypto/chelsio/chtls/chtls_io.c | 54 -
 3 files changed, 23 insertions(+), 53 deletions(-)

diff --git a/drivers/crypto/chelsio/chtls/chtls.h 
b/drivers/crypto/chelsio/chtls/chtls.h
index 778c194..a53a0e6 100644
--- a/drivers/crypto/chelsio/chtls/chtls.h
+++ b/drivers/crypto/chelsio/chtls/chtls.h
@@ -67,11 +67,6 @@ enum {
CPL_RET_UNKNOWN_TID = 4/* unexpected unknown TID */
 };
 
-#define TLS_RCV_ST_READ_HEADER 0xF0
-#define TLS_RCV_ST_READ_BODY   0xF1
-#define TLS_RCV_ST_READ_DONE   0xF2
-#define TLS_RCV_ST_READ_NB 0xF3
-
 #define LISTEN_INFO_HASH_SIZE 32
 #define RSPQ_HASH_BITS 5
 struct listen_info {
@@ -279,6 +274,7 @@ struct tlsrx_cmp_hdr {
 #define TLSRX_HDR_PKT_MAC_ERROR_FTLSRX_HDR_PKT_MAC_ERROR_V(1U)
 
 #define TLSRX_HDR_PKT_ERROR_M   0x1F
+#define CONTENT_TYPE_ERROR 0x7F
 
 struct ulp_mem_rw {
__be32 cmd;
@@ -348,8 +344,8 @@ enum {
ULPCB_FLAG_HOLD  = 1 << 3,  /* skb not ready for Tx yet */
ULPCB_FLAG_COMPL = 1 << 4,  /* request WR completion */
ULPCB_FLAG_URG   = 1 << 5,  /* urgent data */
-   ULPCB_FLAG_TLS_ND= 1 << 6, /* payload of zero length */
-   ULPCB_FLAG_NO_HDR= 1 << 7, /* not a ofld wr */
+   ULPCB_FLAG_TLS_HDR   = 1 << 6,  /* payload with tls hdr */
+   ULPCB_FLAG_NO_HDR= 1 << 7,  /* not a ofld wr */
 };
 
 /* The ULP mode/submode of an skbuff */
diff --git a/drivers/crypto/chelsio/chtls/chtls_cm.c 
b/drivers/crypto/chelsio/chtls/chtls_cm.c
index 23c43b8..2bb6f03 100644
--- a/drivers/crypto/chelsio/chtls/chtls_cm.c
+++ b/drivers/crypto/chelsio/chtls/chtls_cm.c
@@ -1608,12 +1608,14 @@ static void chtls_set_hdrlen(struct sk_buff *skb, 
unsigned int nlen)
 
 static void chtls_rx_hdr(struct sock *sk, struct sk_buff *skb)
 {
-   struct cpl_rx_tls_cmp *cmp_cpl = cplhdr(skb);
+   struct tlsrx_cmp_hdr *tls_hdr_pkt;
+   struct cpl_rx_tls_cmp *cmp_cpl;
struct sk_buff *skb_rec;
struct chtls_sock *csk;
struct chtls_hws *tlsk;
struct tcp_sock *tp;
 
+   cmp_cpl = cplhdr(skb);
csk = rcu_dereference_sk_user_data(sk);
tlsk = >tlshws;
tp = tcp_sk(sk);
@@ -1623,16 +1625,18 @@ static void chtls_rx_hdr(struct sock *sk, struct 
sk_buff *skb)
 
skb_reset_transport_header(skb);
__skb_pull(skb, sizeof(*cmp_cpl));
+   tls_hdr_pkt = (struct tlsrx_cmp_hdr *)skb->data;
+   if (tls_hdr_pkt->res_to_mac_error & TLSRX_HDR_PKT_ERROR_M)
+   tls_hdr_pkt->type = CONTENT_TYPE_ERROR;
if (!skb->data_len)
-   __skb_trim(skb, CPL_RX_TLS_CMP_LENGTH_G
-   (ntohl(cmp_cpl->pdulength_length)));
+   __skb_trim(skb, TLS_HEADER_LENGTH);
 
tp->rcv_nxt +=
CPL_RX_TLS_CMP_PDULENGTH_G(ntohl(cmp_cpl->pdulength_length));
 
+   ULP_SKB_CB(skb)->flags |= ULPCB_FLAG_TLS_HDR;
skb_rec = __skb_dequeue(>sk_recv_queue);
if (!skb_rec) {
-   ULP_SKB_CB(skb)->flags |= ULPCB_FLAG_TLS_ND;
__skb_queue_tail(>sk_receive_queue, skb);
} else {
chtls_set_hdrlen(skb, tlsk->pldlen);
diff --git a/drivers/crypto/chelsio/chtls/chtls_io.c 
b/drivers/crypto/chelsio/chtls/chtls_io.c
index ce90ba1..6957292 100644
--- a/drivers/crypto/chelsio/chtls/chtls_io.c
+++ b/drivers/crypto/chelsio/chtls/chtls_io.c
@@ -1535,31 +1535,13 @@ static int chtls_pt_recvmsg(struct sock *sk, struct 
msghdr *msg, size_t len,
}
}
}
-   if (hws->rstate == TLS_RCV_ST_READ_BODY) {
-   if (skb_copy_datagram_msg(skb, offset,
- msg, avail)) {
-   if (!copied) {
-   copied = -EFAULT;
-   break;
-   }
-   }
-   } else {
-   struct tlsrx_cmp_hdr *tls_hdr_pkt =
-   (struct tlsrx_cmp_hdr *)skb->data;
-
-   if ((tls_hdr_pkt->res_to_mac_error &
-   TLSRX_HDR_PKT_ERROR_M))
-   tls_hdr_pkt->type = 0x7F;
-
-   /* CMP pld len is for recv seq */
-   hws->rcvpld = skb->hdr_len;
-   if (skb_copy_datagram_msg(skb, offset, msg, avail)) {
-   if (!copied) {
-   copied = -EFAULT;
-   break;
-  

[PATCH 5/7] chtls: free beyond end of array rspq_skb_cache

2018-04-25 Thread Atul Gupta
Reported-by: Dan Carpenter 
Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/chelsio/chtls/chtls_main.c 
b/drivers/crypto/chelsio/chtls/chtls_main.c
index e9ffc3d..4dc3d0e 100644
--- a/drivers/crypto/chelsio/chtls/chtls_main.c
+++ b/drivers/crypto/chelsio/chtls/chtls_main.c
@@ -198,7 +198,7 @@ static void *chtls_uld_add(const struct cxgb4_lld_info 
*info)
 {
struct cxgb4_lld_info *lldi;
struct chtls_dev *cdev;
-   int i, j;
+   int i;
 
cdev = kzalloc(sizeof(*cdev) + info->nports *
  (sizeof(struct net_device *)), GFP_KERNEL);
@@ -250,8 +250,8 @@ static void *chtls_uld_add(const struct cxgb4_lld_info 
*info)
 
return cdev;
 out_rspq_skb:
-   for (j = 0; j <= i; j++)
-   kfree_skb(cdev->rspq_skb_cache[j]);
+   for (; i > 0; --i)
+   kfree_skb(cdev->rspq_skb_cache[i]);
kfree_skb(cdev->askb);
 out_skb:
kfree(lldi);
-- 
1.8.3.1



[PATCH 4/7] chtls: kbuild warnings

2018-04-25 Thread Atul Gupta
- unindented continue
- check for null page
- signed return

Reported-by: Dan Carpenter 
Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_io.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/chelsio/chtls/chtls_io.c 
b/drivers/crypto/chelsio/chtls/chtls_io.c
index 85ddc07..ce90ba1 100644
--- a/drivers/crypto/chelsio/chtls/chtls_io.c
+++ b/drivers/crypto/chelsio/chtls/chtls_io.c
@@ -907,11 +907,11 @@ static int chtls_skb_copy_to_page_nocache(struct sock *sk,
 }
 
 /* Read TLS header to find content type and data length */
-static u16 tls_header_read(struct tls_hdr *thdr, struct iov_iter *from)
+static int tls_header_read(struct tls_hdr *thdr, struct iov_iter *from)
 {
if (copy_from_iter(thdr, sizeof(*thdr), from) != sizeof(*thdr))
return -EFAULT;
-   return (__force u16)cpu_to_be16(thdr->length);
+   return (__force int)cpu_to_be16(thdr->length);
 }
 
 static int csk_mem_free(struct chtls_dev *cdev, struct sock *sk)
@@ -1083,6 +1083,9 @@ int chtls_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
int off = TCP_OFF(sk);
bool merge;
 
+   if (!page)
+   goto wait_for_memory;
+
if (page)
pg_size <<= compound_order(page);
 
@@ -1492,7 +1495,7 @@ static int chtls_pt_recvmsg(struct sock *sk, struct 
msghdr *msg, size_t len,
break;
chtls_cleanup_rbuf(sk, copied);
sk_wait_data(sk, , NULL);
-   continue;
+   continue;
 found_ok_skb:
if (!skb->len) {
skb_dst_set(skb, NULL);
-- 
1.8.3.1



[PATCH 2/7] chtls: support only 128b key length

2018-04-25 Thread Atul Gupta
corrected the key length to copy 128b key. Removed 192b and 256b
key as user input supports key of size 128b in gcm_ctx

Reported-by: Dan Carpenter 
Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_hw.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/crypto/chelsio/chtls/chtls_hw.c 
b/drivers/crypto/chelsio/chtls/chtls_hw.c
index 54a13aa9..55d5014 100644
--- a/drivers/crypto/chelsio/chtls/chtls_hw.c
+++ b/drivers/crypto/chelsio/chtls/chtls_hw.c
@@ -213,7 +213,7 @@ static int chtls_key_info(struct chtls_sock *csk,
  struct _key_ctx *kctx,
  u32 keylen, u32 optname)
 {
-   unsigned char key[CHCR_KEYCTX_CIPHER_KEY_SIZE_256];
+   unsigned char key[AES_KEYSIZE_128];
struct tls12_crypto_info_aes_gcm_128 *gcm_ctx;
unsigned char ghash_h[AEAD_H_SIZE];
struct crypto_cipher *cipher;
@@ -228,10 +228,6 @@ static int chtls_key_info(struct chtls_sock *csk,
 
if (keylen == AES_KEYSIZE_128) {
ck_size = CHCR_KEYCTX_CIPHER_KEY_SIZE_128;
-   } else if (keylen == AES_KEYSIZE_192) {
-   ck_size = CHCR_KEYCTX_CIPHER_KEY_SIZE_192;
-   } else if (keylen == AES_KEYSIZE_256) {
-   ck_size = CHCR_KEYCTX_CIPHER_KEY_SIZE_256;
} else {
pr_err("GCM: Invalid key length %d\n", keylen);
return -EINVAL;
-- 
1.8.3.1



[PATCH 3/7] chtls: variable dereferenced before null check

2018-04-25 Thread Atul Gupta
skb dereferenced before check in sendpage

Reported-by: Dan Carpenter 
Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_io.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/crypto/chelsio/chtls/chtls_io.c 
b/drivers/crypto/chelsio/chtls/chtls_io.c
index a4c7d2d..85ddc07 100644
--- a/drivers/crypto/chelsio/chtls/chtls_io.c
+++ b/drivers/crypto/chelsio/chtls/chtls_io.c
@@ -1230,9 +1230,8 @@ int chtls_sendpage(struct sock *sk, struct page *page,
struct sk_buff *skb = skb_peek_tail(>txq);
int copy, i;
 
-   copy = mss - skb->len;
if (!skb || (ULP_SKB_CB(skb)->flags & ULPCB_FLAG_NO_APPEND) ||
-   copy <= 0) {
+   (copy = mss - skb->len) <= 0) {
 new_buf:
if (!csk_mem_free(cdev, sk))
goto wait_for_sndbuf;
-- 
1.8.3.1



[PATCH 1/7] chtls: wait for memory in Tx path

2018-04-25 Thread Atul Gupta
wait for memory in sendmsg and sendpage

Reported-by: Gustavo A. R. Silva 
Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls.h  |  1 +
 drivers/crypto/chelsio/chtls/chtls_io.c   | 90 +--
 drivers/crypto/chelsio/chtls/chtls_main.c |  1 +
 3 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/chelsio/chtls/chtls.h 
b/drivers/crypto/chelsio/chtls/chtls.h
index f4b8f1e..778c194 100644
--- a/drivers/crypto/chelsio/chtls/chtls.h
+++ b/drivers/crypto/chelsio/chtls/chtls.h
@@ -149,6 +149,7 @@ struct chtls_dev {
struct list_head rcu_node;
struct list_head na_node;
unsigned int send_page_order;
+   int max_host_sndbuf;
struct key_map kmap;
 };
 
diff --git a/drivers/crypto/chelsio/chtls/chtls_io.c 
b/drivers/crypto/chelsio/chtls/chtls_io.c
index 5a75be4..a4c7d2d 100644
--- a/drivers/crypto/chelsio/chtls/chtls_io.c
+++ b/drivers/crypto/chelsio/chtls/chtls_io.c
@@ -914,6 +914,78 @@ static u16 tls_header_read(struct tls_hdr *thdr, struct 
iov_iter *from)
return (__force u16)cpu_to_be16(thdr->length);
 }
 
+static int csk_mem_free(struct chtls_dev *cdev, struct sock *sk)
+{
+   return (cdev->max_host_sndbuf - sk->sk_wmem_queued) > 0;
+}
+
+static int csk_wait_memory(struct chtls_dev *cdev,
+  struct sock *sk, long *timeo_p)
+{
+   DEFINE_WAIT_FUNC(wait, woken_wake_function);
+   int sndbuf, err = 0;
+   long current_timeo;
+   long vm_wait = 0;
+   bool noblock;
+
+   current_timeo = *timeo_p;
+   noblock = (*timeo_p ? false : true);
+   sndbuf = cdev->max_host_sndbuf;
+   if (sndbuf > sk->sk_wmem_queued) {
+   current_timeo = (prandom_u32() % (HZ / 5)) + 2;
+   vm_wait = (prandom_u32() % (HZ / 5)) + 2;
+   }
+
+   add_wait_queue(sk_sleep(sk), );
+   while (1) {
+   sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk);
+
+   if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
+   goto do_error;
+   if (!*timeo_p) {
+   if (noblock)
+   set_bit(SOCK_NOSPACE, >sk_socket->flags);
+   goto do_nonblock;
+   }
+   if (signal_pending(current))
+   goto do_interrupted;
+   sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
+   if (sndbuf > sk->sk_wmem_queued && !vm_wait)
+   break;
+
+   set_bit(SOCK_NOSPACE, >sk_socket->flags);
+   sk->sk_write_pending++;
+   sk_wait_event(sk, _timeo, sk->sk_err ||
+ (sk->sk_shutdown & SEND_SHUTDOWN) ||
+ (sndbuf > sk->sk_wmem_queued && !vm_wait), );
+   sk->sk_write_pending--;
+
+   if (vm_wait) {
+   vm_wait -= current_timeo;
+   current_timeo = *timeo_p;
+   if (current_timeo != MAX_SCHEDULE_TIMEOUT) {
+   current_timeo -= vm_wait;
+   if (current_timeo < 0)
+   current_timeo = 0;
+   }
+   vm_wait = 0;
+   }
+   *timeo_p = current_timeo;
+   }
+out:
+   remove_wait_queue(sk_sleep(sk), );
+   return err;
+do_error:
+   err = -EPIPE;
+   goto out;
+do_nonblock:
+   err = -EAGAIN;
+   goto out;
+do_interrupted:
+   err = sock_intr_errno(*timeo_p);
+   goto out;
+}
+
 int chtls_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 {
struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
@@ -952,6 +1024,8 @@ int chtls_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
copy = mss - skb->len;
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
+   if (!csk_mem_free(cdev, sk))
+   goto wait_for_sndbuf;
 
if (is_tls_tx(csk) && !csk->tlshws.txleft) {
struct tls_hdr hdr;
@@ -1099,8 +1173,10 @@ int chtls_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
if (ULP_SKB_CB(skb)->flags & ULPCB_FLAG_NO_APPEND)
push_frames_if_head(sk);
continue;
+wait_for_sndbuf:
+   set_bit(SOCK_NOSPACE, >sk_socket->flags);
 wait_for_memory:
-   err = sk_stream_wait_memory(sk, );
+   err = csk_wait_memory(cdev, sk, );
if (err)
goto do_error;
}
@@ -1131,6 +1207,7 @@ int chtls_sendpage(struct sock *sk, struct page *page,
   int offset, size_t size, int flags)
 {
struct chtls_sock *csk;
+   struct chtls_dev *cdev;
int mss, err, copied;
struct tcp_sock *tp;
long timeo;
@@ 

Re: [PATCH v2 0/5] crypto: Speck support

2018-04-25 Thread Samuel Neves
Let's put the provenance of Speck aside for a moment, and suppose that
it is an ideal block cipher. There are still some issues with this
patch as it stands.

 - The rationale seems off. Consider this bit from the commit message:

> Other AES alternatives such as Twofish, Threefish, Camellia, CAST6, and 
> Serpent aren't
> fast enough either; it seems that only a modern ARX cipher can provide 
> sufficient performance
> on these devices.

One of these things is very much not like the others. Threefish _is_ a
modern ARX cipher---a tweakable block cipher in fact, precluding the
need for XEX-style masking. Is it too slow? Does it not have the
correct block size?

> We've also considered a novel length-preserving encryption mode based on
> ChaCha20 and Poly1305.

I'm very curious about this, namely as to what the role of Poly1305
would be here. ChaCha20's underlying permutation could, of course, be
transformed into a 512-bit tweakable block cipher relatively
painlessly, retaining the performance of regular ChaCha20 with
marginal additional overhead. This would not be a standard
construction, but clearly that is not an issue.

But the biggest problem here, in my mind, is that for all the talk of
using 128-bit block Speck, this patch tacks on the 64-bit block
variant of Speck into the kernel, and speck64-xts as well! As far as I
can tell, this is the _only_ instance of a 64-bit XTS instance in the
entire codebase. Now, if you wanted a fast 64-bit ARX block cipher,
the kernel already had XTEA. Instead, this is adding yet another
64-bit block cipher into the crypto API, in a disk-encryption mode no
less, so that it can be misused later. In the disk encryption setting,
it's particularly concerning to be using such a small block size, as
data volumes can quickly add up to the birthday bound.

> It's easy to say that, but do you have an actual suggestion?

I don't know how seriously you are actually asking this, but some
128-bit software-friendly block ciphers could be SPARX, LEA, RC5, or
RC6. SPARX, in particular, has similarities to Speck but has some
further AES-like design guarantees that other prior ARX block ciphers
did not. Some other bitsliced designs, such as Noekeon or SKINNY, may
also work well with NEON, but I don't know much about their
performance there.

Best regards,
Samuel Neves


Re: [PATCH 1/2] crypto: sm4 - export encrypt/decrypt routines to other drivers

2018-04-25 Thread Gilad Ben-Yossef
On Wed, Apr 25, 2018 at 3:20 PM, Ard Biesheuvel
 wrote:
> In preparation of adding support for the SIMD based arm64 implementation
> of arm64, which requires a fallback to non-SIMD code when invoked in
> certain contexts, expose the generic SM4 encrypt and decrypt routines
> to other drivers.
>
> Signed-off-by: Ard Biesheuvel 

Acked-by: Gilad Ben-Yossef 

> ---
>  crypto/sm4_generic.c | 10 ++
>  include/crypto/sm4.h |  3 +++
>  2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/crypto/sm4_generic.c b/crypto/sm4_generic.c
> index f537a2766c55..c18eebfd5edd 100644
> --- a/crypto/sm4_generic.c
> +++ b/crypto/sm4_generic.c
> @@ -190,21 +190,23 @@ static void sm4_do_crypt(const u32 *rk, u32 *out, const 
> u32 *in)
>
>  /* encrypt a block of text */
>
> -static void sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
> +void crypto_sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>  {
> const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
>
> sm4_do_crypt(ctx->rkey_enc, (u32 *)out, (u32 *)in);
>  }
> +EXPORT_SYMBOL_GPL(crypto_sm4_encrypt);
>
>  /* decrypt a block of text */
>
> -static void sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
> +void crypto_sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>  {
> const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
>
> sm4_do_crypt(ctx->rkey_dec, (u32 *)out, (u32 *)in);
>  }
> +EXPORT_SYMBOL_GPL(crypto_sm4_decrypt);
>
>  static struct crypto_alg sm4_alg = {
> .cra_name   =   "sm4",
> @@ -219,8 +221,8 @@ static struct crypto_alg sm4_alg = {
> .cia_min_keysize=   SM4_KEY_SIZE,
> .cia_max_keysize=   SM4_KEY_SIZE,
> .cia_setkey =   crypto_sm4_set_key,
> -   .cia_encrypt=   sm4_encrypt,
> -   .cia_decrypt=   sm4_decrypt
> +   .cia_encrypt=   crypto_sm4_encrypt,
> +   .cia_decrypt=   crypto_sm4_decrypt
> }
> }
>  };
> diff --git a/include/crypto/sm4.h b/include/crypto/sm4.h
> index b64e64d20b28..7afd730d16ff 100644
> --- a/include/crypto/sm4.h
> +++ b/include/crypto/sm4.h
> @@ -25,4 +25,7 @@ int crypto_sm4_set_key(struct crypto_tfm *tfm, const u8 
> *in_key,
>  int crypto_sm4_expand_key(struct crypto_sm4_ctx *ctx, const u8 *in_key,
>   unsigned int key_len);
>
> +void crypto_sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in);
> +void crypto_sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in);
> +
>  #endif
> --
> 2.17.0
>



-- 
Gilad Ben-Yossef
Chief Coffee Drinker

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru


Re: [PATCH 1/2] crypto: sm4 - export encrypt/decrypt routines to other drivers

2018-04-25 Thread Ard Biesheuvel
On 25 April 2018 at 14:20, Ard Biesheuvel  wrote:
> In preparation of adding support for the SIMD based arm64 implementation
> of arm64,

SM4 ^^^

> which requires a fallback to non-SIMD code when invoked in
> certain contexts, expose the generic SM4 encrypt and decrypt routines
> to other drivers.
>
> Signed-off-by: Ard Biesheuvel 
> ---
>  crypto/sm4_generic.c | 10 ++
>  include/crypto/sm4.h |  3 +++
>  2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/crypto/sm4_generic.c b/crypto/sm4_generic.c
> index f537a2766c55..c18eebfd5edd 100644
> --- a/crypto/sm4_generic.c
> +++ b/crypto/sm4_generic.c
> @@ -190,21 +190,23 @@ static void sm4_do_crypt(const u32 *rk, u32 *out, const 
> u32 *in)
>
>  /* encrypt a block of text */
>
> -static void sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
> +void crypto_sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>  {
> const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
>
> sm4_do_crypt(ctx->rkey_enc, (u32 *)out, (u32 *)in);
>  }
> +EXPORT_SYMBOL_GPL(crypto_sm4_encrypt);
>
>  /* decrypt a block of text */
>
> -static void sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
> +void crypto_sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>  {
> const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
>
> sm4_do_crypt(ctx->rkey_dec, (u32 *)out, (u32 *)in);
>  }
> +EXPORT_SYMBOL_GPL(crypto_sm4_decrypt);
>
>  static struct crypto_alg sm4_alg = {
> .cra_name   =   "sm4",
> @@ -219,8 +221,8 @@ static struct crypto_alg sm4_alg = {
> .cia_min_keysize=   SM4_KEY_SIZE,
> .cia_max_keysize=   SM4_KEY_SIZE,
> .cia_setkey =   crypto_sm4_set_key,
> -   .cia_encrypt=   sm4_encrypt,
> -   .cia_decrypt=   sm4_decrypt
> +   .cia_encrypt=   crypto_sm4_encrypt,
> +   .cia_decrypt=   crypto_sm4_decrypt
> }
> }
>  };
> diff --git a/include/crypto/sm4.h b/include/crypto/sm4.h
> index b64e64d20b28..7afd730d16ff 100644
> --- a/include/crypto/sm4.h
> +++ b/include/crypto/sm4.h
> @@ -25,4 +25,7 @@ int crypto_sm4_set_key(struct crypto_tfm *tfm, const u8 
> *in_key,
>  int crypto_sm4_expand_key(struct crypto_sm4_ctx *ctx, const u8 *in_key,
>   unsigned int key_len);
>
> +void crypto_sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in);
> +void crypto_sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in);
> +
>  #endif
> --
> 2.17.0
>


[PATCH 1/2] crypto: sm4 - export encrypt/decrypt routines to other drivers

2018-04-25 Thread Ard Biesheuvel
In preparation of adding support for the SIMD based arm64 implementation
of arm64, which requires a fallback to non-SIMD code when invoked in
certain contexts, expose the generic SM4 encrypt and decrypt routines
to other drivers.

Signed-off-by: Ard Biesheuvel 
---
 crypto/sm4_generic.c | 10 ++
 include/crypto/sm4.h |  3 +++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/crypto/sm4_generic.c b/crypto/sm4_generic.c
index f537a2766c55..c18eebfd5edd 100644
--- a/crypto/sm4_generic.c
+++ b/crypto/sm4_generic.c
@@ -190,21 +190,23 @@ static void sm4_do_crypt(const u32 *rk, u32 *out, const 
u32 *in)
 
 /* encrypt a block of text */
 
-static void sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+void crypto_sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 {
const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
 
sm4_do_crypt(ctx->rkey_enc, (u32 *)out, (u32 *)in);
 }
+EXPORT_SYMBOL_GPL(crypto_sm4_encrypt);
 
 /* decrypt a block of text */
 
-static void sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+void crypto_sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 {
const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
 
sm4_do_crypt(ctx->rkey_dec, (u32 *)out, (u32 *)in);
 }
+EXPORT_SYMBOL_GPL(crypto_sm4_decrypt);
 
 static struct crypto_alg sm4_alg = {
.cra_name   =   "sm4",
@@ -219,8 +221,8 @@ static struct crypto_alg sm4_alg = {
.cia_min_keysize=   SM4_KEY_SIZE,
.cia_max_keysize=   SM4_KEY_SIZE,
.cia_setkey =   crypto_sm4_set_key,
-   .cia_encrypt=   sm4_encrypt,
-   .cia_decrypt=   sm4_decrypt
+   .cia_encrypt=   crypto_sm4_encrypt,
+   .cia_decrypt=   crypto_sm4_decrypt
}
}
 };
diff --git a/include/crypto/sm4.h b/include/crypto/sm4.h
index b64e64d20b28..7afd730d16ff 100644
--- a/include/crypto/sm4.h
+++ b/include/crypto/sm4.h
@@ -25,4 +25,7 @@ int crypto_sm4_set_key(struct crypto_tfm *tfm, const u8 
*in_key,
 int crypto_sm4_expand_key(struct crypto_sm4_ctx *ctx, const u8 *in_key,
  unsigned int key_len);
 
+void crypto_sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in);
+void crypto_sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in);
+
 #endif
-- 
2.17.0



[PATCH 2/2] crypto: arm64 - add support for SM4 encryption using special instructions

2018-04-25 Thread Ard Biesheuvel
Add support for the SM4 symmetric cipher implemented using the special
SM4 instructions introduced in ARM architecture revision 8.2.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/Kconfig   |  6 ++
 arch/arm64/crypto/Makefile  |  3 +
 arch/arm64/crypto/sm4-ce-core.S | 36 ++
 arch/arm64/crypto/sm4-ce-glue.c | 73 
 4 files changed, 118 insertions(+)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index cb5a243110c4..e3fdb0fd6f70 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -47,6 +47,12 @@ config CRYPTO_SM3_ARM64_CE
select CRYPTO_HASH
select CRYPTO_SM3
 
+config CRYPTO_SM4_ARM64_CE
+   tristate "SM4 symmetric cipher (ARMv8.2 Crypto Extensions)"
+   depends on KERNEL_MODE_NEON
+   select CRYPTO_ALGAPI
+   select CRYPTO_SM4
+
 config CRYPTO_GHASH_ARM64_CE
tristate "GHASH/AES-GCM using ARMv8 Crypto Extensions"
depends on KERNEL_MODE_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index f35ac684b1c0..bcafd016618e 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -23,6 +23,9 @@ sha3-ce-y := sha3-ce-glue.o sha3-ce-core.o
 obj-$(CONFIG_CRYPTO_SM3_ARM64_CE) += sm3-ce.o
 sm3-ce-y := sm3-ce-glue.o sm3-ce-core.o
 
+obj-$(CONFIG_CRYPTO_SM4_ARM64_CE) += sm4-ce.o
+sm4-ce-y := sm4-ce-glue.o sm4-ce-core.o
+
 obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
 ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
 
diff --git a/arch/arm64/crypto/sm4-ce-core.S b/arch/arm64/crypto/sm4-ce-core.S
new file mode 100644
index ..af3bfbc3f4d4
--- /dev/null
+++ b/arch/arm64/crypto/sm4-ce-core.S
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+
+   .irpb, 0, 1, 2, 3, 4, 5, 6, 7, 8
+   .set.Lv\b\().4s, \b
+   .endr
+
+   .macro  sm4e, rd, rn
+   .inst   0xcec08400 | .L\rd | (.L\rn << 5)
+   .endm
+
+   /*
+* void sm4_ce_do_crypt(const u32 *rk, u32 *out, const u32 *in);
+*/
+   .text
+ENTRY(sm4_ce_do_crypt)
+   ld1 {v8.4s}, [x2]
+   ld1 {v0.4s-v3.4s}, [x0], #64
+CPU_LE(rev32   v8.16b, v8.16b  )
+   ld1 {v4.4s-v7.4s}, [x0]
+   sm4ev8.4s, v0.4s
+   sm4ev8.4s, v1.4s
+   sm4ev8.4s, v2.4s
+   sm4ev8.4s, v3.4s
+   sm4ev8.4s, v4.4s
+   sm4ev8.4s, v5.4s
+   sm4ev8.4s, v6.4s
+   sm4ev8.4s, v7.4s
+   rev64   v8.4s, v8.4s
+   ext v8.16b, v8.16b, v8.16b, #8
+CPU_LE(rev32   v8.16b, v8.16b  )
+   st1 {v8.4s}, [x1]
+   ret
+ENDPROC(sm4_ce_do_crypt)
diff --git a/arch/arm64/crypto/sm4-ce-glue.c b/arch/arm64/crypto/sm4-ce-glue.c
new file mode 100644
index ..b7fb5274b250
--- /dev/null
+++ b/arch/arm64/crypto/sm4-ce-glue.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_ALIAS_CRYPTO("sm4");
+MODULE_ALIAS_CRYPTO("sm4-ce");
+MODULE_DESCRIPTION("SM4 symmetric cipher using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel ");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void sm4_ce_do_crypt(const u32 *rk, void *out, const void *in);
+
+static void sm4_ce_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+   const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+   if (!may_use_simd()) {
+   crypto_sm4_encrypt(tfm, out, in);
+   } else {
+   kernel_neon_begin();
+   sm4_ce_do_crypt(ctx->rkey_enc, out, in);
+   kernel_neon_end();
+   }
+}
+
+static void sm4_ce_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+   const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+   if (!may_use_simd()) {
+   crypto_sm4_decrypt(tfm, out, in);
+   } else {
+   kernel_neon_begin();
+   sm4_ce_do_crypt(ctx->rkey_dec, out, in);
+   kernel_neon_end();
+   }
+}
+
+static struct crypto_alg sm4_ce_alg = {
+   .cra_name   = "sm4",
+   .cra_driver_name= "sm4-ce",
+   .cra_priority   = 200,
+   .cra_flags  = CRYPTO_ALG_TYPE_CIPHER,
+   .cra_blocksize  = SM4_BLOCK_SIZE,
+   .cra_ctxsize= sizeof(struct crypto_sm4_ctx),
+   .cra_module = THIS_MODULE,
+   .cra_u.cipher = {
+   .cia_min_keysize= SM4_KEY_SIZE,
+   .cia_max_keysize= SM4_KEY_SIZE,
+   .cia_setkey = crypto_sm4_set_key,
+   .cia_encrypt= sm4_ce_encrypt,
+   .cia_decrypt= 

[PATCH 0/2] crypto: implement SM4 for arm64 using special instructions

2018-04-25 Thread Ard Biesheuvel
Patch #1 makes some preparatory changes so the C routines can be used as
a fallback by other drivers.

Patch #2 implements the SM4 core cipher using the special instructions
introduced as an optional extension by revision 8.2 of the ARM architecture.

Note that this does not implement cipher+chaining mode combinations as we
do for AES. This can be added later if desiresd.

Ard Biesheuvel (2):
  crypto: sm4 - export encrypt/decrypt routines to other drivers
  crypto: arm64 - add support for SM4 encryption using special
instructions

 arch/arm64/crypto/Kconfig   |  6 ++
 arch/arm64/crypto/Makefile  |  3 +
 arch/arm64/crypto/sm4-ce-core.S | 36 ++
 arch/arm64/crypto/sm4-ce-glue.c | 73 
 crypto/sm4_generic.c| 10 +--
 include/crypto/sm4.h|  3 +
 6 files changed, 127 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/crypto/sm4-ce-core.S
 create mode 100644 arch/arm64/crypto/sm4-ce-glue.c

-- 
2.17.0