Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-19 Thread Kim Phillips
On Thu, 19 Mar 2015 17:56:57 +0200
Horia Geantă horia.gea...@freescale.com wrote:

 On 3/18/2015 12:03 AM, Kim Phillips wrote:
  On Tue, 17 Mar 2015 19:58:55 +0200
  Horia Geantă horia.gea...@freescale.com wrote:
  
  On 3/17/2015 2:19 AM, Kim Phillips wrote:
  On Mon, 16 Mar 2015 12:02:51 +0200
  Horia Geantă horia.gea...@freescale.com wrote:
 
  On 3/4/2015 2:23 AM, Kim Phillips wrote:
  Only potential problem is getting the crypto API to set the GFP_DMA
  flag in the allocation request, but presumably a
  CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.
 
  Seems there are quite a few places that do not use the
  {aead,ablkcipher_ahash}_request_alloc() API to allocate crypto requests.
  Among them, IPsec and dm-crypt.
  I've looked at the code and I don't think it can be converted to use
  crypto API.
 
  why not?
 
  It would imply having 2 memory allocations, one for crypto request and
  the other for the rest of the data bundled with the request (for IPsec
  that would be ESN + space for IV + sg entries for authenticated-only
  data and sk_buff extension, if needed).
 
  Trying to have a single allocation by making ESN, IV etc. part of the
  request private context requires modifying tfm.reqsize on the fly.
  This won't work without adding some kind of locking for the tfm.
  
  can't a common minimum tfm.reqsize be co-established up front, at
  least for the fast path?
 
 Indeed, for IPsec at tfm allocation time - esp_init_state() -
 tfm.reqsize could be increased to account for what is known for a given
 flow: ESN, IV and asg (S/G entries for authenticated-only data).
 The layout would be:
 aead request (fixed part)
 private ctx of backend algorithm
 seq_no_hi (if ESN)
 IV
 asg
 sg -- S/G table for skb_to_sgvec; how many entries is the question
 
 Do you have a suggestion for how many S/G entries to preallocate for
 representing the sk_buff data to be encrypted?
 An ancient esp4.c used ESP_NUM_FAST_SG, set to 4.
 Btw, currently maximum number of fragments supported by the net stack
 (MAX_SKB_FRAGS) is 16 or more.
 
  This means that the CRYPTO_TFM_REQ_DMA would be visible to all of these
  places. Some of the maintainers do not agree, as you've seen.
 
  would modifying the crypto API to either have a different
  *_request_alloc() API, and/or adding calls to negotiate the GFP mask
  between crypto users and drivers, e.g., get/set_gfp_mask, work?
 
  I think what DaveM asked for was the change to be transparent.
 
  Besides converting to *_request_alloc(), seems that all other options
  require some extra awareness from the user.
  Could you elaborate on the idea above?
  
  was merely suggesting communicating GFP flags anonymously across the
  API, i.e., GFP_DMA wouldn't appear in user code.
 
 Meaning user would have to get_gfp_mask before allocating a crypto
 request - i.e. instead of kmalloc(..., GFP_ATOMIC) to have
 kmalloc(GFP_ATOMIC | get_gfp_mask(aead))?
 
  An alternative would be for talitos to use the page allocator to get 1 /
  2 pages at probe time (4 channels x 32 entries/channel x 64B/descriptor
  = 8 kB), dma_map_page the area and manage it internally for talitos_desc
  hw descriptors.
  What do you think?
 
  There's a comment in esp_alloc_tmp(): Use spare space in skb for
  this where possible, which is ideally where we'd want to be (esp.
 
  Ok, I'll check that. But note the where possible - finding room in the
  skb to avoid the allocation won't always be the case, and then we're
  back to square one.
 
 So the skb cb is out of the question, being too small (48B).
 Any idea what was the intention of the TODO - maybe to use the
 tailroom in the skb data area?
 
  because that memory could already be DMA-able).  Your above
  suggestion would be in the opposite direction of that.
 
  The proposal:
  -removes dma (un)mapping on the fast path
  
  sure, but at the expense of additional complexity.
 
 Right, there's no free lunch. But it's cheaper.
 
  -avoids requesting dma mappable memory for more than it's actually
  needed (CRYPTO_TFM_REQ_DMA forces entire request to be mappable, not
  only its private context)
  
  compared to the payload?  Plus, we have plenty of DMA space these
  days.
  
  -for caam it has the added benefit of speeding the below search for the
  offending descriptor in the SW ring from O(n) to O(1):
  for (i = 0; CIRC_CNT(head, tail + i, JOBR_DEPTH) = 1; i++) {
 sw_idx = (tail + i)  (JOBR_DEPTH - 1);
 
 if (jrp-outring[hw_idx].desc ==
 jrp-entinfo[sw_idx].desc_addr_dma)
 break; /* found */
  }
  (drivers/crypto/caam/jr.c - caam_dequeue)
  
  how?  The job ring h/w will still be spitting things out
  out-of-order.
 
 jrp-outring[hw_idx].desc bus address can be used to find the sw_idx in
 O(1):
 
 dma_addr_t desc_base = dma_map_page(alloc_page(GFP_DMA),...);
 [...]
 sw_idx = (desc_base - jrp-outring[hw_idx].desc) / JD_SIZE;
 
 JD_SIZE would be 16 words (64B) - 13 words used for the h/w job
 descriptor, 3 words can be used for 

Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-19 Thread Horia Geantă
On 3/18/2015 12:03 AM, Kim Phillips wrote:
 On Tue, 17 Mar 2015 19:58:55 +0200
 Horia Geantă horia.gea...@freescale.com wrote:
 
 On 3/17/2015 2:19 AM, Kim Phillips wrote:
 On Mon, 16 Mar 2015 12:02:51 +0200
 Horia Geantă horia.gea...@freescale.com wrote:

 On 3/4/2015 2:23 AM, Kim Phillips wrote:
 Only potential problem is getting the crypto API to set the GFP_DMA
 flag in the allocation request, but presumably a
 CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.

 Seems there are quite a few places that do not use the
 {aead,ablkcipher_ahash}_request_alloc() API to allocate crypto requests.
 Among them, IPsec and dm-crypt.
 I've looked at the code and I don't think it can be converted to use
 crypto API.

 why not?

 It would imply having 2 memory allocations, one for crypto request and
 the other for the rest of the data bundled with the request (for IPsec
 that would be ESN + space for IV + sg entries for authenticated-only
 data and sk_buff extension, if needed).

 Trying to have a single allocation by making ESN, IV etc. part of the
 request private context requires modifying tfm.reqsize on the fly.
 This won't work without adding some kind of locking for the tfm.
 
 can't a common minimum tfm.reqsize be co-established up front, at
 least for the fast path?

Indeed, for IPsec at tfm allocation time - esp_init_state() -
tfm.reqsize could be increased to account for what is known for a given
flow: ESN, IV and asg (S/G entries for authenticated-only data).
The layout would be:
aead request (fixed part)
private ctx of backend algorithm
seq_no_hi (if ESN)
IV
asg
sg -- S/G table for skb_to_sgvec; how many entries is the question

Do you have a suggestion for how many S/G entries to preallocate for
representing the sk_buff data to be encrypted?
An ancient esp4.c used ESP_NUM_FAST_SG, set to 4.
Btw, currently maximum number of fragments supported by the net stack
(MAX_SKB_FRAGS) is 16 or more.

 This means that the CRYPTO_TFM_REQ_DMA would be visible to all of these
 places. Some of the maintainers do not agree, as you've seen.

 would modifying the crypto API to either have a different
 *_request_alloc() API, and/or adding calls to negotiate the GFP mask
 between crypto users and drivers, e.g., get/set_gfp_mask, work?

 I think what DaveM asked for was the change to be transparent.

 Besides converting to *_request_alloc(), seems that all other options
 require some extra awareness from the user.
 Could you elaborate on the idea above?
 
 was merely suggesting communicating GFP flags anonymously across the
 API, i.e., GFP_DMA wouldn't appear in user code.

Meaning user would have to get_gfp_mask before allocating a crypto
request - i.e. instead of kmalloc(..., GFP_ATOMIC) to have
kmalloc(GFP_ATOMIC | get_gfp_mask(aead))?

 An alternative would be for talitos to use the page allocator to get 1 /
 2 pages at probe time (4 channels x 32 entries/channel x 64B/descriptor
 = 8 kB), dma_map_page the area and manage it internally for talitos_desc
 hw descriptors.
 What do you think?

 There's a comment in esp_alloc_tmp(): Use spare space in skb for
 this where possible, which is ideally where we'd want to be (esp.

 Ok, I'll check that. But note the where possible - finding room in the
 skb to avoid the allocation won't always be the case, and then we're
 back to square one.

So the skb cb is out of the question, being too small (48B).
Any idea what was the intention of the TODO - maybe to use the
tailroom in the skb data area?

 because that memory could already be DMA-able).  Your above
 suggestion would be in the opposite direction of that.

 The proposal:
 -removes dma (un)mapping on the fast path
 
 sure, but at the expense of additional complexity.

Right, there's no free lunch. But it's cheaper.

 -avoids requesting dma mappable memory for more than it's actually
 needed (CRYPTO_TFM_REQ_DMA forces entire request to be mappable, not
 only its private context)
 
 compared to the payload?  Plus, we have plenty of DMA space these
 days.
 
 -for caam it has the added benefit of speeding the below search for the
 offending descriptor in the SW ring from O(n) to O(1):
 for (i = 0; CIRC_CNT(head, tail + i, JOBR_DEPTH) = 1; i++) {
  sw_idx = (tail + i)  (JOBR_DEPTH - 1);

  if (jrp-outring[hw_idx].desc ==
  jrp-entinfo[sw_idx].desc_addr_dma)
  break; /* found */
 }
 (drivers/crypto/caam/jr.c - caam_dequeue)
 
 how?  The job ring h/w will still be spitting things out
 out-of-order.

jrp-outring[hw_idx].desc bus address can be used to find the sw_idx in
O(1):

dma_addr_t desc_base = dma_map_page(alloc_page(GFP_DMA),...);
[...]
sw_idx = (desc_base - jrp-outring[hw_idx].desc) / JD_SIZE;

JD_SIZE would be 16 words (64B) - 13 words used for the h/w job
descriptor, 3 words can be used for smth. else.
Basically all JDs would be filled at a 64B-aligned offset in the memory
page.

 Plus, like I said, it's taking the problem in the wrong direction:
 we need to strive to merge the allocation 

Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-17 Thread Kim Phillips
On Tue, 17 Mar 2015 19:58:55 +0200
Horia Geantă horia.gea...@freescale.com wrote:

 On 3/17/2015 2:19 AM, Kim Phillips wrote:
  On Mon, 16 Mar 2015 12:02:51 +0200
  Horia Geantă horia.gea...@freescale.com wrote:
  
  On 3/4/2015 2:23 AM, Kim Phillips wrote:
  Only potential problem is getting the crypto API to set the GFP_DMA
  flag in the allocation request, but presumably a
  CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.
 
  Seems there are quite a few places that do not use the
  {aead,ablkcipher_ahash}_request_alloc() API to allocate crypto requests.
  Among them, IPsec and dm-crypt.
  I've looked at the code and I don't think it can be converted to use
  crypto API.
  
  why not?
 
 It would imply having 2 memory allocations, one for crypto request and
 the other for the rest of the data bundled with the request (for IPsec
 that would be ESN + space for IV + sg entries for authenticated-only
 data and sk_buff extension, if needed).
 
 Trying to have a single allocation by making ESN, IV etc. part of the
 request private context requires modifying tfm.reqsize on the fly.
 This won't work without adding some kind of locking for the tfm.

can't a common minimum tfm.reqsize be co-established up front, at
least for the fast path?

  This means that the CRYPTO_TFM_REQ_DMA would be visible to all of these
  places. Some of the maintainers do not agree, as you've seen.
  
  would modifying the crypto API to either have a different
  *_request_alloc() API, and/or adding calls to negotiate the GFP mask
  between crypto users and drivers, e.g., get/set_gfp_mask, work?
 
 I think what DaveM asked for was the change to be transparent.
 
 Besides converting to *_request_alloc(), seems that all other options
 require some extra awareness from the user.
 Could you elaborate on the idea above?

was merely suggesting communicating GFP flags anonymously across the
API, i.e., GFP_DMA wouldn't appear in user code.

  An alternative would be for talitos to use the page allocator to get 1 /
  2 pages at probe time (4 channels x 32 entries/channel x 64B/descriptor
  = 8 kB), dma_map_page the area and manage it internally for talitos_desc
  hw descriptors.
  What do you think?
  
  There's a comment in esp_alloc_tmp(): Use spare space in skb for
  this where possible, which is ideally where we'd want to be (esp.
 
 Ok, I'll check that. But note the where possible - finding room in the
 skb to avoid the allocation won't always be the case, and then we're
 back to square one.
 
  because that memory could already be DMA-able).  Your above
  suggestion would be in the opposite direction of that.
 
 The proposal:
 -removes dma (un)mapping on the fast path

sure, but at the expense of additional complexity.

 -avoids requesting dma mappable memory for more than it's actually
 needed (CRYPTO_TFM_REQ_DMA forces entire request to be mappable, not
 only its private context)

compared to the payload?  Plus, we have plenty of DMA space these
days.

 -for caam it has the added benefit of speeding the below search for the
 offending descriptor in the SW ring from O(n) to O(1):
 for (i = 0; CIRC_CNT(head, tail + i, JOBR_DEPTH) = 1; i++) {
   sw_idx = (tail + i)  (JOBR_DEPTH - 1);
 
   if (jrp-outring[hw_idx].desc ==
   jrp-entinfo[sw_idx].desc_addr_dma)
   break; /* found */
 }
 (drivers/crypto/caam/jr.c - caam_dequeue)

how?  The job ring h/w will still be spitting things out
out-of-order.

Plus, like I said, it's taking the problem in the wrong direction:
we need to strive to merge the allocation and mapping with the upper
layers as much as possible.

Kim
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-17 Thread Horia Geantă
On 3/17/2015 2:19 AM, Kim Phillips wrote:
 On Mon, 16 Mar 2015 12:02:51 +0200
 Horia Geantă horia.gea...@freescale.com wrote:
 
 On 3/4/2015 2:23 AM, Kim Phillips wrote:
 Only potential problem is getting the crypto API to set the GFP_DMA
 flag in the allocation request, but presumably a
 CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.

 Seems there are quite a few places that do not use the
 {aead,ablkcipher_ahash}_request_alloc() API to allocate crypto requests.
 Among them, IPsec and dm-crypt.
 I've looked at the code and I don't think it can be converted to use
 crypto API.
 
 why not?

It would imply having 2 memory allocations, one for crypto request and
the other for the rest of the data bundled with the request (for IPsec
that would be ESN + space for IV + sg entries for authenticated-only
data and sk_buff extension, if needed).

Trying to have a single allocation by making ESN, IV etc. part of the
request private context requires modifying tfm.reqsize on the fly.
This won't work without adding some kind of locking for the tfm.

 
 This means that the CRYPTO_TFM_REQ_DMA would be visible to all of these
 places. Some of the maintainers do not agree, as you've seen.
 
 would modifying the crypto API to either have a different
 *_request_alloc() API, and/or adding calls to negotiate the GFP mask
 between crypto users and drivers, e.g., get/set_gfp_mask, work?

I think what DaveM asked for was the change to be transparent.

Besides converting to *_request_alloc(), seems that all other options
require some extra awareness from the user.
Could you elaborate on the idea above?

 
 An alternative would be for talitos to use the page allocator to get 1 /
 2 pages at probe time (4 channels x 32 entries/channel x 64B/descriptor
 = 8 kB), dma_map_page the area and manage it internally for talitos_desc
 hw descriptors.
 What do you think?
 
 There's a comment in esp_alloc_tmp(): Use spare space in skb for
 this where possible, which is ideally where we'd want to be (esp.

Ok, I'll check that. But note the where possible - finding room in the
skb to avoid the allocation won't always be the case, and then we're
back to square one.

 because that memory could already be DMA-able).  Your above
 suggestion would be in the opposite direction of that.

The proposal:
-removes dma (un)mapping on the fast path
-avoids requesting dma mappable memory for more than it's actually
needed (CRYPTO_TFM_REQ_DMA forces entire request to be mappable, not
only its private context)
-for caam it has the added benefit of speeding the below search for the
offending descriptor in the SW ring from O(n) to O(1):
for (i = 0; CIRC_CNT(head, tail + i, JOBR_DEPTH) = 1; i++) {
sw_idx = (tail + i)  (JOBR_DEPTH - 1);

if (jrp-outring[hw_idx].desc ==
jrp-entinfo[sw_idx].desc_addr_dma)
break; /* found */
}
(drivers/crypto/caam/jr.c - caam_dequeue)

Horia


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-16 Thread Kim Phillips
On Mon, 16 Mar 2015 12:02:51 +0200
Horia Geantă horia.gea...@freescale.com wrote:

 On 3/4/2015 2:23 AM, Kim Phillips wrote:
  Only potential problem is getting the crypto API to set the GFP_DMA
  flag in the allocation request, but presumably a
  CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.
 
 Seems there are quite a few places that do not use the
 {aead,ablkcipher_ahash}_request_alloc() API to allocate crypto requests.
 Among them, IPsec and dm-crypt.
 I've looked at the code and I don't think it can be converted to use
 crypto API.

why not?

 This means that the CRYPTO_TFM_REQ_DMA would be visible to all of these
 places. Some of the maintainers do not agree, as you've seen.

would modifying the crypto API to either have a different
*_request_alloc() API, and/or adding calls to negotiate the GFP mask
between crypto users and drivers, e.g., get/set_gfp_mask, work?

 An alternative would be for talitos to use the page allocator to get 1 /
 2 pages at probe time (4 channels x 32 entries/channel x 64B/descriptor
 = 8 kB), dma_map_page the area and manage it internally for talitos_desc
 hw descriptors.
 What do you think?

There's a comment in esp_alloc_tmp(): Use spare space in skb for
this where possible, which is ideally where we'd want to be (esp.
because that memory could already be DMA-able).  Your above
suggestion would be in the opposite direction of that.

Kim
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-16 Thread Horia Geantă
On 3/4/2015 2:23 AM, Kim Phillips wrote:
 Only potential problem is getting the crypto API to set the GFP_DMA
 flag in the allocation request, but presumably a
 CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.

Seems there are quite a few places that do not use the
{aead,ablkcipher_ahash}_request_alloc() API to allocate crypto requests.
Among them, IPsec and dm-crypt.
I've looked at the code and I don't think it can be converted to use
crypto API.

This means that the CRYPTO_TFM_REQ_DMA would be visible to all of these
places. Some of the maintainers do not agree, as you've seen.

An alternative would be for talitos to use the page allocator to get 1 /
2 pages at probe time (4 channels x 32 entries/channel x 64B/descriptor
= 8 kB), dma_map_page the area and manage it internally for talitos_desc
hw descriptors.
What do you think?

Thanks,
Horia


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-05 Thread Kim Phillips
On Thu, 5 Mar 2015 11:35:23 +0200
Horia Geantă horia.gea...@freescale.com wrote:

 On 3/4/2015 2:23 AM, Kim Phillips wrote:
  On Tue,  3 Mar 2015 08:21:37 -0500
  Martin Hicks m...@bork.org wrote:
  
  @@ -1170,6 +1237,8 @@ static struct talitos_edesc 
  *talitos_edesc_alloc(struct device *dev,
  edesc-dma_len,
  DMA_BIDIRECTIONAL);
 edesc-req.desc = edesc-desc;
  +  /* A copy of the crypto_async_request to use the crypto_queue backlog */
  +  memcpy(edesc-req.base, areq, sizeof(struct crypto_async_request));
  
  this seems backward, or, at least can be done more efficiently IMO:
  talitos_cra_init should set the tfm's reqsize so the rest of
  the driver can wholly embed its talitos_edesc (which should also
  wholly encapsulate its talitos_request (i.e., not via a pointer))
  into the crypto API's request handle allocation.  This
  would absorb and eliminate the talitos_edesc kmalloc and frees, the
  above memcpy, and replace the container_of after the
  crypto_dequeue_request with an offset_of, right?
  
  When scatter-gather buffers are needed, we can assume a slower-path
  and make them do their own allocations, since their sizes vary
  depending on each request.  Of course, a pointer to those
  allocations would need to be retained somewhere in the request
  handle.
 
 Unfortunately talitos_edesc structure size is most of the times
 variable. Its exact size can only be established at request time, and
 not at tfm init time.

yes, I was suggesting a common minimum should be set in cra_init.

 Fixed size would be sizeof(talitos_edesc).
 Below are factors that influence the variable part, i.e. link_tbl in
 talitos_edesc:
 - whether any assoc / src / dst data is scattered
 - icv_stashing (case when ICV checking is done in SW)

both being slow(er) paths, IMO.

 Still we'd be better with:
 -crypto API allocates request + request context (i.e.
 sizeof(talitos_edesc) + any alignment required)
 -talitos driver allocates variable part of talitos_edesc (if needed)
 
 instead of:
 -crypto API allocates request
 -talitos driver allocates talitos_edesc (fixed + variable)
 -memcopy of the req.base (crypto_async_request) into talitos_edesc
 
 both in terms of performance and readability.

indeed.

 At first look, the driver wouldn't change that much:
 -talitos_cra_init() callback would have to set tfm.reqsize to
 sizeof(talitos_edesc) + padding and also add the CRYPTO_TFM_REQ_DMA
 indication in tfm.crt_flags
 -talitos_edesc_alloc() logic would be pretty much the same, but would
 allocate memory only for the link_tbl
 
 I'm willing to do these changes if needed.

Please coordinate with Martin.

fwiw, caam could use this, too.

Kim
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-05 Thread Horia Geantă
On 3/4/2015 2:23 AM, Kim Phillips wrote:
 On Tue,  3 Mar 2015 08:21:37 -0500
 Martin Hicks m...@bork.org wrote:
 
 @@ -1170,6 +1237,8 @@ static struct talitos_edesc 
 *talitos_edesc_alloc(struct device *dev,
   edesc-dma_len,
   DMA_BIDIRECTIONAL);
  edesc-req.desc = edesc-desc;
 +/* A copy of the crypto_async_request to use the crypto_queue backlog */
 +memcpy(edesc-req.base, areq, sizeof(struct crypto_async_request));
 
 this seems backward, or, at least can be done more efficiently IMO:
 talitos_cra_init should set the tfm's reqsize so the rest of
 the driver can wholly embed its talitos_edesc (which should also
 wholly encapsulate its talitos_request (i.e., not via a pointer))
 into the crypto API's request handle allocation.  This
 would absorb and eliminate the talitos_edesc kmalloc and frees, the
 above memcpy, and replace the container_of after the
 crypto_dequeue_request with an offset_of, right?
 
 When scatter-gather buffers are needed, we can assume a slower-path
 and make them do their own allocations, since their sizes vary
 depending on each request.  Of course, a pointer to those
 allocations would need to be retained somewhere in the request
 handle.

Unfortunately talitos_edesc structure size is most of the times
variable. Its exact size can only be established at request time, and
not at tfm init time.

Fixed size would be sizeof(talitos_edesc).
Below are factors that influence the variable part, i.e. link_tbl in
talitos_edesc:
- whether any assoc / src / dst data is scattered
- icv_stashing (case when ICV checking is done in SW)

Still we'd be better with:
-crypto API allocates request + request context (i.e.
sizeof(talitos_edesc) + any alignment required)
-talitos driver allocates variable part of talitos_edesc (if needed)

instead of:
-crypto API allocates request
-talitos driver allocates talitos_edesc (fixed + variable)
-memcopy of the req.base (crypto_async_request) into talitos_edesc

both in terms of performance and readability.

At first look, the driver wouldn't change that much:
-talitos_cra_init() callback would have to set tfm.reqsize to
sizeof(talitos_edesc) + padding and also add the CRYPTO_TFM_REQ_DMA
indication in tfm.crt_flags
-talitos_edesc_alloc() logic would be pretty much the same, but would
allocate memory only for the link_tbl

I'm willing to do these changes if needed.

 
 Only potential problem is getting the crypto API to set the GFP_DMA
 flag in the allocation request, but presumably a
 CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.

Right. And this flag would apply only to request __ctx[].

Herbert, would this be an acceptable addition to crypto API?

Thanks,
Horia


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-05 Thread Herbert Xu
On Thu, Mar 05, 2015 at 11:35:23AM +0200, Horia Geantă wrote:
 
  Only potential problem is getting the crypto API to set the GFP_DMA
  flag in the allocation request, but presumably a
  CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.
 
 Right. And this flag would apply only to request __ctx[].
 
 Herbert, would this be an acceptable addition to crypto API?

How would such a flag work?

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] crypto: talitos: Add software backlog queue handling

2015-03-03 Thread Kim Phillips
On Tue,  3 Mar 2015 08:21:37 -0500
Martin Hicks m...@bork.org wrote:

 @@ -1170,6 +1237,8 @@ static struct talitos_edesc *talitos_edesc_alloc(struct 
 device *dev,
edesc-dma_len,
DMA_BIDIRECTIONAL);
   edesc-req.desc = edesc-desc;
 + /* A copy of the crypto_async_request to use the crypto_queue backlog */
 + memcpy(edesc-req.base, areq, sizeof(struct crypto_async_request));

this seems backward, or, at least can be done more efficiently IMO:
talitos_cra_init should set the tfm's reqsize so the rest of
the driver can wholly embed its talitos_edesc (which should also
wholly encapsulate its talitos_request (i.e., not via a pointer))
into the crypto API's request handle allocation.  This
would absorb and eliminate the talitos_edesc kmalloc and frees, the
above memcpy, and replace the container_of after the
crypto_dequeue_request with an offset_of, right?

When scatter-gather buffers are needed, we can assume a slower-path
and make them do their own allocations, since their sizes vary
depending on each request.  Of course, a pointer to those
allocations would need to be retained somewhere in the request
handle.

Only potential problem is getting the crypto API to set the GFP_DMA
flag in the allocation request, but presumably a
CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that.

Kim
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html