Hi Grant,
>>>
>>
>> I've made a POC of a soft async engine. Based on dasync engine it launchs a
>> thread on priv_rsa_enc to spread the load on multiple cores.
>>
>> Regarding openssl s_server it is efficient and scale very well depending the
>> number of core (1700 rsa2048/s on one core, 7400 on 4 cores)
>>
>> But using haproxy i'm still facing the same issue. There is a poor scale as
>> for qat. If i check a top, i see that haproxy uses 100% of one core
>> and some of the time 80% of an other but for a very short period.
> Does this occur with single haproxy process(nproc=1) and you see haproxy
> process bounce between cores? Or you see haproxy occupying one core and
> occasionally use 80% of another core?
>
>>
>> In my opinion, there is very few parallelized jobs. I hope a clean rebase
>> will fix the issue.
> Sorry for the delay. Attached is the rebased patches, on top of the latest
> git head:
> http://git.haproxy.org/?p=haproxy.git;a=commit;h=013a84fe939cf393fbcf8deb9b4504941d382777
>
> Wrt scaling, my use case is mainly for offloading the cpu load during
> handshake, and as such spreading over multiple cores is not desirable. I've
> tried increasing nproc for scale out purpose. nproc(6 to 8) gives me the max
> connection rate until it hits the hw limit of the qat card. The connection
> rate increase (close to) linearly with the number of processes.
>
>>
>> On haproxy conn/s side:
>> native ssl: 1200 conn/s
> Which cipher do you use for testing? For me I use ECDHE-RSA-AES128-GCM-SHA256
> and I see about 500 conn/s. For some other cipher I see a bit higher
> number(e.g AES128-SHA256 550 conn/s), but haven't seen 1200conn/s. Not sure
> whether cpu/hw makes such a big difference.
>
>> qat+async: 1700 conn/s
> I see about 2000 conn/s per core, so not much different than your number.
>
> Thanks,
>
> Grant
>
>
>
I'm using a Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz.
I've do some tests on the rebased version.
Finally I reach interesting results disabling the perfect forward secrecy and
i'm now close to the announced performance for the hardware (4100 conn/s for
announced 5.5Krsa/s on 8920)
I've also good performances using my patched software engines and it scales
well.
I did'nt notice any unexplained CPU usage or crashes on this version. So we are
close to merge your patches.
I've few comments about your patches:
- For the first patch (engine).
I think it would be better to load engines directly during the parsing, it
would detect engine misconfiguration on the right line.
In other word, a call to 'ssl_init_single_engine' directly from
'ssl_parse_global_ssl_engine'. This way you wouldn't have to handle
a list of engines to free. In addition, postponing the init in 'crt/crt-list',
you missed to init engines in the case of ssl is used only on the server side.
- For the async patch ()
Haproxy does not compile on version < 1.1 due to these lines:
OSSL_ASYNC_FD async_fd; in types/connection.h (and all lines referencing
async_fd in ssl_sock.c if you fix only the header.
and
#include <openssl/async.h> in src/ssl_sock.c
It would be more interesting also to move your test following the
SSL_read/write/handshaje this way:
Instead of:
#if OPENSSL_VERSION_NUMBER >= 0x1010000fL
if (ret == SSL_ERROR_WANT_ASYNC) {
...
}
#endif
if (ret == SSL_ERROR_WANT_WRITE) {
...
}
else if (ret == SSL_ERROR_WANT_READ) {
...
}
else if (ret == SSL_ERROR_SYSCALL) {
...
}
like this:
if (ret == SSL_ERROR_WANT_WRITE) {
...
}
else if (ret == SSL_ERROR_WANT_READ) {
...
}
#if OPENSSL_VERSION_NUMBER >= 0x1010000fL
else if (ret == SSL_ERROR_WANT_ASYNC) {
...
}
#endif
else if (ret == SSL_ERROR_SYSCALL) {
...
}
About 'ssl_async_process_fds' and 'SSL_get_changed_async_fds':
The doc says you have to call SSL_get_changed_async_fds with NULL to ensure you
have enough place in your fd's buffer to retrieve the fds.
So here you have a potential buffer overflow.
The doc also says also 'However if multiple asynchronous capable engines are in
use then more than one is possible.'
I think you should authorize ssl-async only if there is only one engine
configured because:
/* we don't support more than 1 async fds */
if (num_add_fds > 1 || num_del_fds > 1)
return;
i will results to unpredictable behavior.
Finally, i think it would preferable to rename 'ssl-async' to
'ssl-engine-async' or 'ssl-mode-async'
Thanks a lot for your job Grant!
R,
Emeric