Re: dup2() uses FD already allocated to NET skt

Jukka Laitinen Tue, 01 Jul 2025 22:19:31 -0700

Hi,

I didn't follow the full discussion, but we had similar problems withsockets, file descriptors and dup. Just to draw your attention, this wasthe PR / patches where the issue was fixed:

https://github.com/apache/nuttx/pull/16499 (originallyhttps://github.com/apache/nuttx/pull/16361)

- I am mentioning the original PR just because the patches were takenover and rewritten by Xiaomi, and we never updated to the upstreamversion nor have tested them. But perhaps also the upstream PR works, asI suppose it was tested by people who decided to re-wrote it.

So please check if the above fixes are in your branch, otherwise dupjust won't work and you will end up using file descriptions to randomfiles/devices.


Br,

Jukka


On 1.7.2025 22.43, Tim Hardisty wrote:

I have wasted WAY too many days trying to understand what's going onhere: you start on the premise that it "must of worked" but I amreally not so sure with this!
I think the idea of the timeout is simply to cover the case that theCGI "app" goes AWOL and needs to be killed, if so configured inKconfig to do that.
Please bear in mind that my POSIX/NuttX/RTOS skills arelimited...but...the file descriptors don't seem to behave as I wouldexpect, based on NuttX and POSIX documentation.
 * The file descriptors correctly exist in the cgi() function but as
   soon as the task_create is called the FD becomes invalid, to the
   cgi() functiona that called the task_create. NuttX says that a newly
   created task only inherits the first 3 descriptors, but it seems
   that the very fact of calling a task_create seems to kill higher
   FD's. Is that correct behaviour? Note: I was trying so many
   different things, my "scatter gun" approach might have misled me.
 * I tried pthread_create (and posix_spawn) instead of task_create()
   but the FD still seemed to get trashed. So there may be more going
   on here than my skills allow me to investigate.
 * I also saw issues that the group ownership of a pthread instead of
   task was seemingly wrong. But I deferred to my "lack of experience"
   as the reason.
Right now, reverting the PR that added the O_CLOEXEC has made it work.Maybe not fixed it "properly", but at least made it work.
On 01/07/2025 19:50, Bernd Walter wrote:
Out of curiosity I just took a look into the thhtp cgi code and noticed
something I don't understand myself.
There is the cgi_child(), which looks like it is to be called after afork
for the child case.
It prepares the filedescriptors and calls exec.
The result of the exec is stored in a variable named child, which isan odd name.
Normally exec never returns and if it does it is an error.
The normal thing would be for the child to kill itself.
It goes into error handling in case the exec returned < 0, which italways
does when exec returns.
However the next thing it does is setting up a timeout for the child.
It is the child and it already failed the exec, why the timeout whenwe are
already in an error state because we returned from an exec.

However:
The function is called with task_create, so not a fork.
If I got it right then the task has separate filedescriptors as aforked processwould have and exec closing the task copies of the sockets should befine, as
long as the use count on those are properly increased.
But if it is intendend to behave like fork/exec, why does stuffhappen after
the exec?


On Tue, Jul 01, 2025 at 03:00:23PM -0300, Alan C. Assis wrote:
Hi Tim,
You are right, it doesn't execute, but some subprocess (like a CGI)could
try to execute.

This comment there shed some light about it:

"I wouldn't describe O_CLOEXEC as there principally for privilege
escalation / security reasons -- it's also very,
very common to have non-security bugs happen (frequently of the
indefinite-blocking variety) if a FD is left
open beyond when it's intended to be closed because a subprocessstill has
it."

So, why does removing SOCK_CLOEXEC make http work? If the fd is not
executed, the socket shouldn't be closed, right?

And why was it working in the past? Which modification broke this?
Maybe understanding it is important to have the right fix (mayberemoving
it is acting as a band-aid).

Wengzhe, could you please help us to understand this network issue?

BR,

Alan

On Tue, Jul 1, 2025 at 12:28 PM Tim Hardisty<timhardist...@gmail.com>
wrote:
But that's the point - thttp *does* call exec() so the open socketfiledescriptor gets closed when it is still needed by the exec'dapplication.
If there's another way of doing this I'm listening!

On 01/07/2025 16:13, Alan C. Assis wrote:
Hi Tim,

Nice finding!
Now we need to understand why this worked in the past and now itdoesn't.
Also, what are the implications of removing SOCK_CLOEXEC? A fewpointers
here:
https://stackoverflow.com/questions/22304631/what-is-the-purpose-to-set-sock-cloexec-flag-with-accept4-same-as-o-cloexec
BR,

Alan

On Tue, Jul 1, 2025 at 11:27 AM Tim Hardisty<timhardist...@gmail.com>
wrote:
The error was, indeed, the socket being opened with the SOCK_CLOEXEC
flasg set.

PR to follow.

On 28/06/2025 16:16, Tim Hardisty wrote:
Actually - it might be a change last year. The socket is now opened
like this and I assume CLOEXEC will mess up the operation of the
executed CGI app (will investigate on Monday; not sure what socket
mode it needs to be):

hc->conn_fd = accept4(listen_fd, (struct sockaddr *)&sa, &sz,
SOCK_CLOEXEC);

On 28/06/2025 13:22, Alan C. Assis wrote:
Hi Tim,

Yes, I think send() is the preferred form to work with sockets
because you
can have fine control, i.e. passing flags at forth argument
(MSG_DONTWAIT,
etc).
If you suspect that the bug was caused by some recentmodification,
try to
find a supported board that was used to test thttpd in the pastand
test an
old NuttX release with it.
This is the approach I use to double check if something isbroken in
the
mainline.

BR,

Alan
On Fri, Jun 27, 2025 at 3:39 PM Tim Hardisty<timhardist...@gmail.com
wrote:
Is it as "simple" as thttpd should do:

nwritten= send(sock_fd, buffer, totalbytesread, 0);

rather than the generic:

nwritten= write(sock_fd, buffer, nbytes);

On 27/06/2025 18:40, Tim Hardisty wrote:
Trying to get thttpd's CGI handling working and have foundthat the
dup(2) calls of stdin and stdout return a file descriptor that's
already been allocated to the NET socket (via thttpd I think).

That isn't right is it?
I am not sure if it's a side effect of something that thttpddoes
(that might have been OK in the past but is now not right) or a
NuttX
bug, of a missing Kconfig setting that relates to this.
The result is that the ultimate copying of buffered html thatshouldbe written via the socket FD gets rejected as the FD doesn'thave WR
access (and is now the wrong FD anyway!).

Perhaps there's been a change in the way NuttX deals with all of
this
that didn't get sorted in thttpd?

Re: dup2() uses FD already allocated to NET skt

Reply via email to