[jira] [Commented] (DISPATCH-1352) qd_buffer_list_clone cost is dominated by cache misses

Francesco Nigro (JIRA) Mon, 03 Jun 2019 08:22:35 -0700


    [ 
https://issues.apache.org/jira/browse/DISPATCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854704#comment-16854704
 ]


Francesco Nigro commented on DISPATCH-1352:
-------------------------------------------

[~kgiusti] 

{quote}Instead of allocating a qd_message_pvt_t structure we allocate a single 
block of memory large enuff to hold the qd_message_pv_t structure and N 
qd_buffer_t structures and lay them down "cheek to jowl" in the buffer, linking 
the qd_buffer_ts as normal, but incrementing the refcount to prevent freeing 
them individually.{quote}

If we allocate upfront (3*N with N >=1) qd_buffer_t it would help: considering 
the context where such clone list will happen, they seems to always contain *at 
least* 1 qd_buffer_it, so allocating them upfront makes totally sense.
And the lifecycle of such qd_buffer_t is already bounded to qd_message_pvt_t 
one.
About the code impact; that's a whole different story: we need to recognize the 
"embedded" qd_buffer_t while freeing qd_buffer_list_t to avoid deallocating 
them.

{quote}That would avoid the extra calls to qd_buffer_t allocate, make better 
use of the cache (fingers crossed) all without having to touch the iterator 
code (which is everywhere and expects qd_buffer_t based data).{quote}

That's my bet too :) 
Fingers crossed!

> qd_buffer_list_clone cost is dominated by cache misses
> ------------------------------------------------------
>
>                 Key: DISPATCH-1352
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-1352
>             Project: Qpid Dispatch
>          Issue Type: Improvement
>          Components: Routing Engine
>    Affects Versions: 1.7.0
>            Reporter: Francesco Nigro
>            Priority: Major
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
> qd_buffer_list_clone on qd_message_copy for 
> qd_message_pvt_t.ma_to_override/ma_trace/ma_ingress is dominated by cache 
> misses costs:
> * to "allocate" new qd_buffer_t
> * to reference any qd_buffer_t from the source qd_buffer_list_t
> Such cost is the main reason why the core thread is having a very low IPC (< 
> 1 istr/cycle) and given the single threaded nature of the router while 
> dealing with it, by solving it will bring a huge performance improvement to 
> make the router able to scale better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (DISPATCH-1352) qd_buffer_list_clone cost is dominated by cache misses

Reply via email to