[ 
https://issues.apache.org/jira/browse/PROTON-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264603#comment-15264603
 ] 

michael goulish commented on PROTON-992:
----------------------------------------

Dispatch is not yet immune to this issue.
Also, I think Proton needs to let the application handle initialization and 
shutdown of Cyrus SASL.

I made a test that brings up a 6-router network, and randomly kills and 
restarts routers.
I get a router core, usually within 5 iterations, because of this issue.

Here is how I fixed it:

  1. Let dispatch code call sasl_client_init() and sasl_server_init()  at the 
top of qd_server_run().  And remove these calls from Proton.  In keeping these 
calls to itself, Proton cannot prevent two threads from simultaneously getting 
into sasl_*_init().  SegV City.

  2. Prevent proton from calling sasl_{client,server}_done(), in 
pni_sasl_impl_free().   Being thread-agnostic, Proton cannot possibly know when 
it's safe to dispose of the sasl object, which is being used by many threads.   
Both of those Cyrus calls affect global state by NULLing out a global pointer 
that stores the mechanisms string.

With these changes, my test has now run to 400 iterations with no crash.



> Proton's use of Cyrus SASL is not thread-safe.
> ----------------------------------------------
>
>                 Key: PROTON-992
>                 URL: https://issues.apache.org/jira/browse/PROTON-992
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c
>    Affects Versions: 0.10
>            Reporter: michael goulish
>            Assignee: Andrew Stitcher
>            Priority: Critical
>
> Documentation for the Cyrus SASL library says that the library is believed to 
> be thread-safe only if the code that uses it meets several requirements.
> The requirements are:
>     * you supply mutex functions (see sasl_set_mutex())
>     * you make no libsasl calls until sasl_client/server_init() completes
>     * no libsasl calls are made after sasl_done() is begun
>     * when using GSSAPI, you use a thread-safe GSS / Kerberos 5 library.
> It says explicitly that that sasl_set* calls are not thread safe, since they 
> set global state.
> The proton library makes calls to sasl_set* functions in :
>           pni_init_client()
>           pni_init_server(), and
>           pni_process_init()
> Since those are internal functions, there is no way for code that uses Proton 
> to lock around those calls.
> I think proton needs a new API call to let applications call 
> sasl_set_mutex().  Or something.
> We probably also need other protections to meet the other requirements 
> specified in the Cyrus documentation (and quoted above).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to