a-lunev commented on a change in pull request #5252:
URL: https://github.com/apache/incubator-nuttx/pull/5252#discussion_r793127298



##########
File path: net/tcp/tcp_send_buffered.c
##########
@@ -355,10 +355,39 @@ static uint16_t psock_send_eventhandler(FAR struct 
net_driver_s *dev,
                                         FAR void *pvconn, FAR void *pvpriv,
                                         uint16_t flags)
 {
-  FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+  /* FAR struct tcp_conn_s *conn = (FAR struct tcp_conn_s *)pvconn;
+   *
+   * Do not use pvconn argument to get the TCP connection pointer (the above
+   * commented line) because pvconn is normally NULL for some events like
+   * NETDEV_DOWN. Instead, the TCP connection pointer can be reliably
+   * obtained from the corresponding TCP socket.
+   */
+
   FAR struct socket *psock = (FAR struct socket *)pvpriv;
+  FAR struct tcp_conn_s *conn;
   bool rexmit = false;
 
+  DEBUGASSERT(psock != NULL);
+
+  /* Get the TCP connection pointer reliably from
+   * the corresponding TCP socket.
+   */
+
+  conn = psock->s_conn;
+  DEBUGASSERT(conn != NULL);
+
+  /* The TCP socket is connected and, hence, should be bound to a device.
+   * Make sure that the polling device is the one that we are bound to.
+   */
+
+  DEBUGASSERT(conn->dev != NULL);

Review comment:
       > if they are expected to be always true, why don't you just add them?
   
   I think it is reasonable to add this debug assert soon to the code (possibly 
more complex condition with a better debug). Let's investigate the discovered 
issue first.
   
   > btw, what's the motivation of this PR?
   > it seems that it's proven the old way was actually more reliable than the 
new way...
   
   The motivation of PR #5252 is that pvconn argument can be sometimes NULL. At 
least it is known now for NETDEV_DOWN event. As I understand, we can consider 
this behavior was done by design (not a bug). However, inside of the callback 
handler it is not good that we can not rely on pvconn argument w/o dependence 
on other conditions (it was even not documented). I think it gets complicated 
and error-prone during further code changes. Moreover, as it turned out, before 
PR #5252 NuttX still crashed in tcp send unbuffered mode in case of NETDEV_DOWN 
event (when network interface went down during TCP sending) because 
tcp_send_unbuffered.c checked conditions in the following order: (flags & 
TCP_ACKDATA) -> else if (flags & TCP_REXMIT) -> else if (flags & 
TCP_DISCONN_EVENTS). This order has never changed in tcp_send_unbuffered.c (and 
this original order looks the best concerning slightly better performance at 
high network traffic). The order was previously changed (to workaround 
pvconn=NULL
 ) only in tcp_send_buffered and tcp_sendfile callback functions.
   Thus I had idea that if we can not rely on pvconn argument, then 
alternatively the connection pointer can be obtained from the corresponding 
socket structure through pvpriv argument of the callback function (PR #5252 
implements the idea). And then I have recovered the original condition check 
order in tcp_sendfile (#5262) towards tcp_send_unbuffered vs tcp_sendfile code 
unification (today these two are already unified vs each other for the most).
   
   As neither me nor you currently do not know the root cause of your 
discovered case, I think we should investigate the issue and understand the 
root cause to make right decision.
   Preliminary to me it seems that `DEBUGASSERT(conn->dev != NULL);` has helped 
to catch an abnormal behavior. If `DEBUGASSERT(pvconn == conn || pvconn == 
NULL);` also triggers, that will mean there is a serious bug (sockets are bound 
to wrong connections at some time, and vice versa).
   If we workaround this by masking the problem w/o understanding what is 
happening, I suppose it may lead to unpredictable and not desirable 
consequences.
   BTW based on the commit you were testing 
([71d3ff1](https://github.com/apache/incubator-nuttx/commit/71d3ff104553aebf4cb938ee8abc6fd7d34fd239))
 have you tried to revert PR #5252 locally? Has the floating issue fully 
disappeared?
   Also please consider commits 
[f73abc](https://github.com/apache/incubator-nuttx/commit/f73abc76d521bec76982ccc05ff01dfb2c113351)
 and 
[019fc0](https://github.com/apache/incubator-nuttx/commit/019fc0ad78a160aad7bd9182fc315e211e2850a7)
 that influenced sim/netdev seriously (the TCP throughput of sim/netdev was 
increased significantly). I suppose it may expose some hidden bugs that might 
exist and were silent before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to