[
https://issues.apache.org/jira/browse/PROTON-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joshua Seagrave updated PROTON-2933:
------------------------------------
Attachment: main-1.cpp
> Windows/SChannel: AMQP 1.0 SASL auth-failure lost in handshake
> --------------------------------------------------------------
>
> Key: PROTON-2933
> URL: https://issues.apache.org/jira/browse/PROTON-2933
> Project: Qpid Proton
> Issue Type: Bug
> Components: cpp-binding
> Affects Versions: proton-c-0.40.0
> Environment: OS: Windows 11 (x64)
> Proton: qpid-proton 0.40 from vcpkg
> Compiler: MSVC
> Broker: RabbitMQ 4.3.0 with native AMQP 1.0
> Reporter: Joshua Seagrave
> Priority: Major
> Attachments: CMakeLists.txt, docker-compose.yml, main.cpp, vcpkg.json
>
>
> On Windows, when Proton C++ is built against the SChannel TLS backend and
> connects to an AMQP 1.0 broker over {{amqps://}} with invalid PLAIN
> credentials, the broker's auth-failure response is silently lost. No
> {{messaging_handler}} callbacks are dispatched while the container is
> running. The events only flush when {{container::stop()}} is forced, and even
> then {{on_transport_close}} arrives with an empty {{error_condition}} — the
> {{amqp:unauthorized-access}} was discarded somewhere in the teardown path.
> The same scenario behaves correctly under the OpenSSL backend (Proton-Python
> on Windows), strongly suggesting the bug is in the SChannel binding's
> handling of the close-immediately-after-{{{}sasl-outcome{}}} race.
> The application-visible consequence is severe: a plugin/service can't tell
> that authentication failed. It just hangs, with no events to act on, no way
> to surface the error to the user, no way to trigger a credential refresh.
>
> *Reproducer*
> A minimal standalone reproducer is attached. It traces every
> {{messaging_handler}} callback at top-of-function (so swallowed exceptions in
> error-condition accessors can't be confused for "callback didn't fire") and
> includes a 30-second watchdog that forces {{container::stop()}} if no
> terminal callback has arrived.
> *Steps to reproduce*
> # Build qpid-proton-cpp on Windows with the SChannel backend (the vcpkg
> default).
> # Use the included docker file to stand up a RabbitMQ 4.3 instance with TLS
> enabled and default credentials.
> # Run {{main.cpp}}
> *Expected behaviour*
> {{on_transport_error}} (and/or {{{}on_transport_close{}}}) fires with
> {{error_condition.name() == "amqp:unauthorized-access"}} and a description
> along the lines of {{{}Authentication failed [mech=PLAIN]{}}}. The
> application can read the condition synchronously inside the callback. The
> container returns from {{run()}} shortly afterwards. This is the behaviour
> observed under Proton-Python on Windows.
>
> *Sample output:*
> {code:java}
> [..] connect() issued: amqps://...:5671
> [..] on_transport_error: amqp:unauthorized-access - Authentication failed
> [mech=PLAIN]
> [..] [transport_error] SASL outcome=1 user='valid-user' mech='PLAIN'
> [..] on_disconnected
> Total elapsed: ~3 seconds.
> {code}
>
> *Observed behaviour (the bug)*
> Against the same broker with the same credentials, on Windows + SChannel:
> {code:java}
> [..] === Proton C++ SChannel SASL race reproducer ===
> [..] connecting: amqps://...:5671
> [..] on_container_start
> <-- 30 seconds of total silence: no callbacks fire -->
> [..] WATCHDOG: 30s elapsed with no terminal callback; forcing container.stop()
> [..] on_transport_open
> [..] on_transport_close
> [..] cond.name='' cond.desc=''
> [..] sasl.outcome=1 sasl.user='valid-user' sasl.mech='PLAIN'
> [..] === container.run() returned ==={code}
> Note specifically:
> - {{on_transport_error}} is never fired.
> - {{on_transport_open}} and {{on_transport_close}} only arrive in the flush
> triggered by the watchdog's {{container::stop()}} call. Without an external
> mechanism forcing the stop, an application sits silently forever.
> - {{{}on_transport_close{}}}'s {{error_condition}} is empty. The
> {{amqp:unauthorized-access}} condition the broker sent was not propagated
> onto the transport before the close-handling path tore it down.
> - The SASL accessor ({{{}transport.sasl(){}}}) does still report the outcome
> correctly when queried in {{{}on_transport_close{}}}. So the SASL state is
> preserved internally; it just isn't surfaced as an {{error_condition}} on the
> transport, and no error event is dispatched.
>
> Broker-side log (RabbitMQ):
> {code:java}
> [error] closing AMQP connection (...) (duration: '6s'):
> [error] {handshake_error,waiting_sasl_init,
> [error] {'v1_0.error',{symbol,<<"amqp:unauthorized-access">>},
> [error] {utf8,<<"PLAIN login refused: user 'valid-user' - invalid
> credentials">>}, [error] undefined}}{code}
> So the auth failure is unambiguously reaching the client over the wire. The
> bytes are processed by Proton (the SASL accessor reflects them after the
> eventual flush). They simply do not produce a {{messaging_handler}} event in
> the SChannel-backed configuration.
> *Probable mechanism*
> AMQP 1.0 lets a server send {{sasl-outcome}} and immediately close the TCP
> connection without waiting for the client to ack as the spec doesn't require
> a hand-back. RabbitMQ does this. The {{sasl-outcome}} frame and the TCP FIN
> can therefore arrive at the client in the same OS-level read. Clients have to
> handle this gracefully: the outcome event must be dispatched even though the
> transport is simultaneously transitioning to closed.
> The OpenSSL-backed code path appears to handle this race correctly. The
> SChannel-backed path appears to lose the outcome event during the
> near-simultaneous teardown - the transport's {{error_condition}} is never
> populated with {{{}amqp:unauthorized-access{}}}, no {{transport_error}} event
> is queued, and the eventual {{transport_close}} carries an empty condition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]