Hi Grant,

>>>
>>
>> I've made a POC of a soft async engine. Based on dasync engine it launchs a 
>> thread on priv_rsa_enc to spread the load on multiple cores.
>>
>> Regarding openssl s_server it is efficient and scale very well depending the 
>> number of core (1700 rsa2048/s on one core, 7400 on 4 cores)
>>
>> But using haproxy i'm still facing the same issue. There is a poor scale as 
>> for qat. If i check a top, i see that haproxy uses 100% of one core
>> and some of the time 80% of an other but for a very short period.
> Does this occur with single haproxy process(nproc=1) and you see haproxy 
> process bounce between cores? Or you see haproxy occupying one core and 
> occasionally use 80% of another core?
> 
>>
>> In my opinion, there is very few parallelized jobs. I hope a clean rebase 
>> will fix the issue.
> Sorry for the delay. Attached is the rebased patches, on top of the latest 
> git head:
> http://git.haproxy.org/?p=haproxy.git;a=commit;h=013a84fe939cf393fbcf8deb9b4504941d382777
> 
> Wrt scaling, my use case is mainly for offloading the cpu load during 
> handshake, and as such spreading over multiple cores is not desirable. I've 
> tried increasing nproc for scale out purpose. nproc(6 to 8) gives me the max 
> connection rate until it hits the hw limit of the qat card. The connection 
> rate increase (close to) linearly with the number of processes.
> 
>>
>> On haproxy conn/s side:
>> native ssl: 1200 conn/s
> Which cipher do you use for testing? For me I use ECDHE-RSA-AES128-GCM-SHA256 
> and I see about 500 conn/s. For some other cipher I see a bit higher 
> number(e.g AES128-SHA256 550 conn/s), but haven't seen 1200conn/s. Not sure 
> whether cpu/hw makes such a big difference.
> 
>> qat+async: 1700 conn/s
> I see about 2000 conn/s per core, so not much different than your number.
> 
> Thanks,
> 
> Grant
> 
> 
> 

I'm using a Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz.

I've do some tests on the rebased version.

Finally I reach interesting results disabling the perfect forward secrecy and 
i'm now close to the announced performance for the hardware (4100 conn/s for 
announced 5.5Krsa/s on  8920)

I've also good performances using my patched software engines and it scales 
well.

I did'nt notice any unexplained CPU usage or crashes on this version. So we are 
close to merge your patches.

I've few comments about your patches:

- For the first patch (engine).

I think it would be better to load engines directly during the parsing, it 
would detect engine misconfiguration on the right line.

In other word, a call to 'ssl_init_single_engine' directly from 
'ssl_parse_global_ssl_engine'. This way you wouldn't have to handle 
a list of engines to free. In addition, postponing the init in 'crt/crt-list', 
you missed to init engines in the case of ssl is used only on the server side.

- For the async patch ()

Haproxy does not compile on version < 1.1 due to these lines:

OSSL_ASYNC_FD async_fd; in types/connection.h (and all lines referencing 
async_fd in ssl_sock.c if you fix only the header.

and

#include <openssl/async.h> in src/ssl_sock.c


It would be more interesting also to move your test following the 
SSL_read/write/handshaje this way:

Instead of:

#if OPENSSL_VERSION_NUMBER >= 0x1010000fL
if (ret == SSL_ERROR_WANT_ASYNC) {
        ...
}
#endif

if (ret == SSL_ERROR_WANT_WRITE) {
        ...
}
else if (ret == SSL_ERROR_WANT_READ) {
        ...
}
else if (ret == SSL_ERROR_SYSCALL) {
        ...
}

like this:


if (ret == SSL_ERROR_WANT_WRITE) {
        ...
}
else if (ret == SSL_ERROR_WANT_READ) {
        ...
}
#if OPENSSL_VERSION_NUMBER >= 0x1010000fL
else if (ret == SSL_ERROR_WANT_ASYNC) {
        ...
}
#endif
else if (ret == SSL_ERROR_SYSCALL) {
        ...
}

About 'ssl_async_process_fds' and 'SSL_get_changed_async_fds':

The doc says you have to call SSL_get_changed_async_fds with NULL to ensure you 
have enough place in your fd's buffer to retrieve the fds.

So here you have a potential buffer overflow.

The doc also says also 'However if multiple asynchronous capable engines are in 
use then more than one is possible.'

I think you should authorize ssl-async only if there is only one engine 
configured because:

        /* we don't support more than 1 async fds */
        if (num_add_fds > 1 || num_del_fds > 1)
                return;

i will results to unpredictable behavior.

Finally, i think it would preferable to rename 'ssl-async' to 
'ssl-engine-async' or 'ssl-mode-async'

Thanks a lot for your job Grant!

R,
Emeric






Reply via email to