[ 
https://issues.apache.org/jira/browse/PROTON-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Seagrave updated PROTON-2933:
------------------------------------
    Description: 
On Windows, when Proton C++ is built against the SChannel TLS backend and 
connects to an AMQP 1.0 broker over {{amqps://}} with invalid PLAIN 
credentials, the broker's auth-failure response is silently lost. No 
{{messaging_handler}} callbacks are dispatched while the container is running. 
The events only flush when {{container::stop()}} is forced, and even then 
{{on_transport_close}} arrives with an empty {{error_condition}} — the 
{{amqp:unauthorized-access}} was discarded somewhere in the teardown path.

The same scenario behaves correctly under the OpenSSL backend (Proton-Python on 
Windows), strongly suggesting the bug is in the SChannel binding's handling of 
the close-immediately-after-{{{}sasl-outcome{}}} race.

The application-visible consequence is severe: a plugin/service can't tell that 
authentication failed. It just hangs, with no events to act on, no way to 
surface the error to the user, no way to trigger a credential refresh.

 

*Reproducer*

A minimal standalone reproducer is attached. It traces every 
{{messaging_handler}} callback at top-of-function (so swallowed exceptions in 
error-condition accessors can't be confused for "callback didn't fire") and 
includes a 30-second watchdog that forces {{container::stop()}} if no terminal 
callback has arrived.

*Steps to reproduce*
 # Build qpid-proton-cpp on Windows with the SChannel backend (the vcpkg 
default).
 # Use the included docker file to stand up a RabbitMQ 4.3 instance with TLS 
enabled and default credentials.
 # Run {{main.cpp}}

*Expected behaviour*

{{on_transport_error}} (and/or {{{}on_transport_close{}}}) fires with 
{{error_condition.name() == "amqp:unauthorized-access"}} and a description 
along the lines of {{{}Authentication failed [mech=PLAIN]{}}}. The application 
can read the condition synchronously inside the callback. The container returns 
from {{run()}} shortly afterwards. This is the behaviour observed under 
Proton-Python on Windows.

 

*Sample output:*
{code:java}
[..] connect() issued: amqps://...:5671
[..] on_transport_error: amqp:unauthorized-access - Authentication failed 
[mech=PLAIN]
[..] [transport_error] SASL outcome=1 user='valid-user' mech='PLAIN'
[..] on_disconnected

Total elapsed: ~3 seconds.
{code}
 

*Observed behaviour (the bug)*

Against the same broker with the same credentials, on Windows + SChannel:
{code:java}
[..] === Proton C++ SChannel SASL race reproducer ===
[..] connecting: amqps://...:5671
[..] on_container_start
<-- 30 seconds of total silence: no callbacks fire -->
[..] WATCHDOG: 30s elapsed with no terminal callback; forcing container.stop()
[..] on_transport_open
[..] on_transport_close
[..] cond.name='' cond.desc=''
[..] sasl.outcome=1 sasl.user='valid-user' sasl.mech='PLAIN'
[..] === container.run() returned ==={code}
Note specifically:
 - {{on_transport_error}} is never fired.
 - {{on_transport_open}} and {{on_transport_close}} only arrive in the flush 
triggered by the watchdog's {{container::stop()}} call. Without an external 
mechanism forcing the stop, an application sits silently forever.
 - {{{}on_transport_close{}}}'s {{error_condition}} is empty. The 
{{amqp:unauthorized-access}} condition the broker sent was not propagated onto 
the transport before the close-handling path tore it down.
 - The SASL accessor ({{{}transport.sasl(){}}}) does still report the outcome 
correctly when queried in {{{}on_transport_close{}}}. So the SASL state is 
preserved internally; it just isn't surfaced as an {{error_condition}} on the 
transport, and no error event is dispatched.

 

Broker-side log (RabbitMQ):
{code:java}
[error] closing AMQP connection (...) (duration: '6s'):
[error] {handshake_error,waiting_sasl_init,
[error] {'v1_0.error',{symbol,<<"amqp:unauthorized-access">>},
[error] {utf8,<<"PLAIN login refused: user 'valid-user' - invalid 
credentials">>}, [error] undefined}}{code}
So the auth failure is unambiguously reaching the client over the wire. The 
bytes are processed by Proton (the SASL accessor reflects them after the 
eventual flush). They simply do not produce a {{messaging_handler}} event in 
the SChannel-backed configuration.

*Probable mechanism*

AMQP 1.0 lets a server send {{sasl-outcome}} and immediately close the TCP 
connection without waiting for the client to ack as the spec doesn't require a 
hand-back. RabbitMQ does this. The {{sasl-outcome}} frame and the TCP FIN can 
therefore arrive at the client in the same OS-level read. Clients have to 
handle this gracefully: the outcome event must be dispatched even though the 
transport is simultaneously transitioning to closed.

The OpenSSL-backed code path appears to handle this race correctly. The 
SChannel-backed path appears to lose the outcome event during the 
near-simultaneous teardown - the transport's {{error_condition}} is never 
populated with {{{}amqp:unauthorized-access{}}}, no {{transport_error}} event 
is queued, and the eventual {{transport_close}} carries an empty condition.

  was:
On Windows, when Proton C++ is built against the SChannel TLS backend and 
connects to an AMQP 1.0 broker over {{amqps://}} with invalid PLAIN 
credentials, the broker's auth-failure response is silently lost. No 
{{messaging_handler}} callbacks are dispatched while the container is running. 
The events only flush when {{container::stop()}} is forced, and even then 
{{on_transport_close}} arrives with an empty {{error_condition}} — the 
{{amqp:unauthorized-access}} was discarded somewhere in the teardown path.

The same scenario behaves correctly under the OpenSSL backend (Proton-Python on 
Windows), strongly suggesting the bug is in the SChannel binding's handling of 
the close-immediately-after-{{{}sasl-outcome{}}} race.

The application-visible consequence is severe: a plugin/service can't tell that 
authentication failed. It just hangs, with no events to act on, no way to 
surface the error to the user, no way to trigger a credential refresh.

 

*Reproducer*

A minimal standalone reproducer is attached. It traces every 
{{messaging_handler}} callback at top-of-function (so swallowed exceptions in 
error-condition accessors can't be confused for "callback didn't fire") and 
includes a 30-second watchdog that forces {{container::stop()}} if no terminal 
callback has arrived.

*Steps to reproduce*
 # Build qpid-proton-cpp on Windows with the SChannel backend (the vcpkg 
default).
 # Stand up an AMQP 1.0 broker that rejects bad PLAIN credentials and closes 
the TCP socket immediately after {{sasl-outcome}} (RabbitMQ 4.x is one such 
broker).
 # Add the appropriate credentials to {{main.cpp}} (lines 171-173)
 # Run the reproducer pointed at that broker with credentials known to be 
invalid.

*Expected behaviour*

{{on_transport_error}} (and/or {{{}on_transport_close{}}}) fires with 
{{error_condition.name() == "amqp:unauthorized-access"}} and a description 
along the lines of {{{}Authentication failed [mech=PLAIN]{}}}. The application 
can read the condition synchronously inside the callback. The container returns 
from {{run()}} shortly afterwards. This is the behaviour observed under 
Proton-Python on Windows.

 

*Sample output:*
{code:java}
[..] connect() issued: amqps://...:5671
[..] on_transport_error: amqp:unauthorized-access - Authentication failed 
[mech=PLAIN]
[..] [transport_error] SASL outcome=1 user='valid-user' mech='PLAIN'
[..] on_disconnected

Total elapsed: ~3 seconds.
{code}
 

*Observed behaviour (the bug)*

Against the same broker with the same credentials, on Windows + SChannel:
{code:java}
[..] === Proton C++ SChannel SASL race reproducer ===
[..] connecting: amqps://...:5671
[..] on_container_start
<-- 30 seconds of total silence: no callbacks fire -->
[..] WATCHDOG: 30s elapsed with no terminal callback; forcing container.stop()
[..] on_transport_open
[..] on_transport_close
[..] cond.name='' cond.desc=''
[..] sasl.outcome=1 sasl.user='valid-user' sasl.mech='PLAIN'
[..] === container.run() returned ==={code}
Note specifically:
 - {{on_transport_error}} is never fired.
 - {{on_transport_open}} and {{on_transport_close}} only arrive in the flush 
triggered by the watchdog's {{container::stop()}} call. Without an external 
mechanism forcing the stop, an application sits silently forever.
 - {{{}on_transport_close{}}}'s {{error_condition}} is empty. The 
{{amqp:unauthorized-access}} condition the broker sent was not propagated onto 
the transport before the close-handling path tore it down.
 - The SASL accessor ({{{}transport.sasl(){}}}) does still report the outcome 
correctly when queried in {{{}on_transport_close{}}}. So the SASL state is 
preserved internally; it just isn't surfaced as an {{error_condition}} on the 
transport, and no error event is dispatched.

 

Broker-side log (RabbitMQ):
{code:java}
[error] closing AMQP connection (...) (duration: '6s'):
[error] {handshake_error,waiting_sasl_init,
[error] {'v1_0.error',{symbol,<<"amqp:unauthorized-access">>},
[error] {utf8,<<"PLAIN login refused: user 'valid-user' - invalid 
credentials">>}, [error] undefined}}{code}
So the auth failure is unambiguously reaching the client over the wire. The 
bytes are processed by Proton (the SASL accessor reflects them after the 
eventual flush). They simply do not produce a {{messaging_handler}} event in 
the SChannel-backed configuration.

*Probable mechanism*

AMQP 1.0 lets a server send {{sasl-outcome}} and immediately close the TCP 
connection without waiting for the client to ack as the spec doesn't require a 
hand-back. RabbitMQ does this. The {{sasl-outcome}} frame and the TCP FIN can 
therefore arrive at the client in the same OS-level read. Clients have to 
handle this gracefully: the outcome event must be dispatched even though the 
transport is simultaneously transitioning to closed.

The OpenSSL-backed code path appears to handle this race correctly. The 
SChannel-backed path appears to lose the outcome event during the 
near-simultaneous teardown - the transport's {{error_condition}} is never 
populated with {{{}amqp:unauthorized-access{}}}, no {{transport_error}} event 
is queued, and the eventual {{transport_close}} carries an empty condition.


> Windows/SChannel: AMQP 1.0 SASL auth-failure lost in handshake
> --------------------------------------------------------------
>
>                 Key: PROTON-2933
>                 URL: https://issues.apache.org/jira/browse/PROTON-2933
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: cpp-binding
>    Affects Versions: proton-c-0.40.0
>         Environment: OS: Windows 11 (x64)
> Proton: qpid-proton 0.40 from vcpkg
> Compiler: MSVC
> Broker: RabbitMQ 4.3.0 with native AMQP 1.0
>            Reporter: Joshua Seagrave
>            Priority: Major
>         Attachments: CMakeLists.txt, docker-compose.yml, main.cpp, vcpkg.json
>
>
> On Windows, when Proton C++ is built against the SChannel TLS backend and 
> connects to an AMQP 1.0 broker over {{amqps://}} with invalid PLAIN 
> credentials, the broker's auth-failure response is silently lost. No 
> {{messaging_handler}} callbacks are dispatched while the container is 
> running. The events only flush when {{container::stop()}} is forced, and even 
> then {{on_transport_close}} arrives with an empty {{error_condition}} — the 
> {{amqp:unauthorized-access}} was discarded somewhere in the teardown path.
> The same scenario behaves correctly under the OpenSSL backend (Proton-Python 
> on Windows), strongly suggesting the bug is in the SChannel binding's 
> handling of the close-immediately-after-{{{}sasl-outcome{}}} race.
> The application-visible consequence is severe: a plugin/service can't tell 
> that authentication failed. It just hangs, with no events to act on, no way 
> to surface the error to the user, no way to trigger a credential refresh.
>  
> *Reproducer*
> A minimal standalone reproducer is attached. It traces every 
> {{messaging_handler}} callback at top-of-function (so swallowed exceptions in 
> error-condition accessors can't be confused for "callback didn't fire") and 
> includes a 30-second watchdog that forces {{container::stop()}} if no 
> terminal callback has arrived.
> *Steps to reproduce*
>  # Build qpid-proton-cpp on Windows with the SChannel backend (the vcpkg 
> default).
>  # Use the included docker file to stand up a RabbitMQ 4.3 instance with TLS 
> enabled and default credentials.
>  # Run {{main.cpp}}
> *Expected behaviour*
> {{on_transport_error}} (and/or {{{}on_transport_close{}}}) fires with 
> {{error_condition.name() == "amqp:unauthorized-access"}} and a description 
> along the lines of {{{}Authentication failed [mech=PLAIN]{}}}. The 
> application can read the condition synchronously inside the callback. The 
> container returns from {{run()}} shortly afterwards. This is the behaviour 
> observed under Proton-Python on Windows.
>  
> *Sample output:*
> {code:java}
> [..] connect() issued: amqps://...:5671
> [..] on_transport_error: amqp:unauthorized-access - Authentication failed 
> [mech=PLAIN]
> [..] [transport_error] SASL outcome=1 user='valid-user' mech='PLAIN'
> [..] on_disconnected
> Total elapsed: ~3 seconds.
> {code}
>  
> *Observed behaviour (the bug)*
> Against the same broker with the same credentials, on Windows + SChannel:
> {code:java}
> [..] === Proton C++ SChannel SASL race reproducer ===
> [..] connecting: amqps://...:5671
> [..] on_container_start
> <-- 30 seconds of total silence: no callbacks fire -->
> [..] WATCHDOG: 30s elapsed with no terminal callback; forcing container.stop()
> [..] on_transport_open
> [..] on_transport_close
> [..] cond.name='' cond.desc=''
> [..] sasl.outcome=1 sasl.user='valid-user' sasl.mech='PLAIN'
> [..] === container.run() returned ==={code}
> Note specifically:
>  - {{on_transport_error}} is never fired.
>  - {{on_transport_open}} and {{on_transport_close}} only arrive in the flush 
> triggered by the watchdog's {{container::stop()}} call. Without an external 
> mechanism forcing the stop, an application sits silently forever.
>  - {{{}on_transport_close{}}}'s {{error_condition}} is empty. The 
> {{amqp:unauthorized-access}} condition the broker sent was not propagated 
> onto the transport before the close-handling path tore it down.
>  - The SASL accessor ({{{}transport.sasl(){}}}) does still report the outcome 
> correctly when queried in {{{}on_transport_close{}}}. So the SASL state is 
> preserved internally; it just isn't surfaced as an {{error_condition}} on the 
> transport, and no error event is dispatched.
>  
> Broker-side log (RabbitMQ):
> {code:java}
> [error] closing AMQP connection (...) (duration: '6s'):
> [error] {handshake_error,waiting_sasl_init,
> [error] {'v1_0.error',{symbol,<<"amqp:unauthorized-access">>},
> [error] {utf8,<<"PLAIN login refused: user 'valid-user' - invalid 
> credentials">>}, [error] undefined}}{code}
> So the auth failure is unambiguously reaching the client over the wire. The 
> bytes are processed by Proton (the SASL accessor reflects them after the 
> eventual flush). They simply do not produce a {{messaging_handler}} event in 
> the SChannel-backed configuration.
> *Probable mechanism*
> AMQP 1.0 lets a server send {{sasl-outcome}} and immediately close the TCP 
> connection without waiting for the client to ack as the spec doesn't require 
> a hand-back. RabbitMQ does this. The {{sasl-outcome}} frame and the TCP FIN 
> can therefore arrive at the client in the same OS-level read. Clients have to 
> handle this gracefully: the outcome event must be dispatched even though the 
> transport is simultaneously transitioning to closed.
> The OpenSSL-backed code path appears to handle this race correctly. The 
> SChannel-backed path appears to lose the outcome event during the 
> near-simultaneous teardown - the transport's {{error_condition}} is never 
> populated with {{{}amqp:unauthorized-access{}}}, no {{transport_error}} event 
> is queued, and the eventual {{transport_close}} carries an empty condition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to