a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r793127298
##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct
net_driver_s *dev,
FAR void *pvconn, FAR void *pvpriv,
uint16_t flags)
{
- FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+ /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+ *
+ * Do not use pvconn argument to get the TCP connection pointer (the above
+ * commented line) because pvconn is normally NULL for some events like
+ * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+ * obtained from the corresponding TCP socket.
+ */
+
FAR struct socket *psock = (FAR struct socket *)pvpriv;
+ FAR struct tcp_conn_s *conn;
bool rexmit = false;
+ DEBUGASSERT(psock != NULL);
+
+ /* Get the TCP connection pointer reliably from
+ * the corresponding TCP socket.
+ */
+
+ conn = psock->s_conn;
+ DEBUGASSERT(conn != NULL);
+
+ /* The TCP socket is connected and, hence, should be bound to a device.
+ * Make sure that the polling device is the one that we are bound to.
+ */
+
+ DEBUGASSERT(conn->dev != NULL);
Review comment:
> if they are expected to be always true, why don't you just add them?
I think it is reasonable to add this debug assert soon to the code (possibly
more complex condition with a better debug). Let's investigate the discovered
issue first.
> btw, what's the motivation of this PR?
> it seems that it's proven the old way was actually more reliable than the
new way...
The motivation of PR #5252 is that pvconn argument can be sometimes NULL. At
least it is known now for NETDEV_DOWN event. As I understand, we can consider
this behavior was done by design (not a bug). However, inside of the callback
handler it is not good that we can not rely on pvconn argument w/o dependence
on other conditions (it was even not documented). I think it gets complicated
and error-prone during further code changes. Moreover, as it turned out, before
PR #5252 NuttX still crashed in tcp send unbuffered mode in case of NETDEV_DOWN
event (when network interface went down during TCP sending) because
tcp_send_unbuffered.c checked conditions in the following order: (flags &
TCP_ACKDATA) -> else if (flags & TCP_REXMIT) -> else if (flags &
TCP_DISCONN_EVENTS). This order has never changed in tcp_send_unbuffered.c (and
this original order looks the best concerning slightly better performance at
high network traffic). The order was previously changed (to workaround
pvconn=NULL
) only in tcp_send_buffered and tcp_sendfile callback functions.
Thus I had idea that if we can not rely on pvconn argument, then
alternatively the connection pointer can be obtained from the corresponding
socket structure through pvpriv argument of the callback function (the
connection pointer is expected to be not NULL) (PR #5252 implements the idea).
And then I have recovered the original condition check order in tcp_sendfile
(#5262) towards tcp_send_unbuffered vs tcp_sendfile code unification (today
these two are already unified vs each other for the most).
As neither me nor you currently know the root cause of your discovered case,
I think we should investigate the issue and understand the root cause to make
right decision.
Preliminary to me it seems that `DEBUGASSERT(conn->dev != NULL);` has helped
to catch an abnormal behavior. If `DEBUGASSERT(pvconn == conn || pvconn ==
NULL);` also triggers, that will mean there is a serious bug (sockets are bound
to wrong connections at some time, and vice versa).
If we workaround this by masking the problem w/o understanding what is
happening, I suppose it may lead to unpredictable and not desirable
consequences.
BTW based on the commit you were testing
([71d3ff1](https://github.com/apache/incubator-nuttx/commit/71d3ff104553aebf4cb938ee8abc6fd7d34fd239))
have you tried to revert PR #5252 locally? Has the floating issue fully
disappeared?
Also please consider commits
[f73abc](https://github.com/apache/incubator-nuttx/commit/f73abc76d521bec76982ccc05ff01dfb2c113351)
and
[019fc0](https://github.com/apache/incubator-nuttx/commit/019fc0ad78a160aad7bd9182fc315e211e2850a7)
that have influenced sim/netdev seriously (the TCP throughput of sim/netdev
was increased significantly). I suppose it may expose some hidden bugs that
might exist and were silent before.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]