Re: dup2() uses FD already allocated to NET skt

Alin Jerpelea Tue, 08 Jul 2025 21:24:24 -0700

On Wed, Jul 2, 2025 at 7:19 AM Jukka Laitinen <jukka.laiti...@iki.fi> wrote:


> Hi,
>
> I didn't follow the full discussion, but we had similar problems with
> sockets, file descriptors and dup. Just to draw your attention, this was
> the PR / patches where the issue was fixed:
>
> https://github.com/apache/nuttx/pull/16499 (originally
> https://github.com/apache/nuttx/pull/16361)
>
> - I am mentioning the original PR just because the patches were taken
> over and rewritten by Xiaomi, and we never updated to the upstream
> version nor have tested them. But perhaps also the upstream PR works, as
> I suppose it was tested by people who decided to re-wrote it.
>
> So please check if the above fixes are in your branch, otherwise dup
> just won't work and you will end up using file descriptions to random
> files/devices.
>
> Br,
>
> Jukka
>
>
> On 1.7.2025 22.43, Tim Hardisty wrote:
> > I have wasted WAY too many days trying to understand what's going on
> > here: you start on the premise that it "must of worked" but I am
> > really not so sure with this!
> >
> > I think the idea of the timeout is simply to cover the case that the
> > CGI "app" goes AWOL and needs to be killed, if so configured in
> > Kconfig to do that.
> >
> > Please bear in mind that my POSIX/NuttX/RTOS skills are
> > limited...but...the file descriptors don't seem to behave as I would
> > expect, based on NuttX and POSIX documentation.
> >
> >  * The file descriptors correctly exist in the cgi() function but as
> >    soon as the task_create is called the FD becomes invalid, to the
> >    cgi() functiona that called the task_create. NuttX says that a newly
> >    created task only inherits the first 3 descriptors, but it seems
> >    that the very fact of calling a task_create seems to kill higher
> >    FD's. Is that correct behaviour? Note: I was trying so many
> >    different things, my "scatter gun" approach might have misled me.
> >  * I tried pthread_create (and posix_spawn) instead of task_create()
> >    but the FD still seemed to get trashed. So there may be more going
> >    on here than my skills allow me to investigate.
> >  * I also saw issues that the group ownership of a pthread instead of
> >    task was seemingly wrong. But I deferred to my "lack of experience"
> >    as the reason.
> >
> > Right now, reverting the PR that added the O_CLOEXEC has made it work.
> > Maybe not fixed it "properly", but at least made it work.
> >
> > On 01/07/2025 19:50, Bernd Walter wrote:
> >> Out of curiosity I just took a look into the thhtp cgi code and noticed
> >> something I don't understand myself.
> >>
> >> There is the cgi_child(), which looks like it is to be called after a
> >> fork
> >> for the child case.
> >> It prepares the filedescriptors and calls exec.
> >> The result of the exec is stored in a variable named child, which is
> >> an odd name.
> >> Normally exec never returns and if it does it is an error.
> >> The normal thing would be for the child to kill itself.
> >> It goes into error handling in case the exec returned < 0, which it
> >> always
> >> does when exec returns.
> >> However the next thing it does is setting up a timeout for the child.
> >> It is the child and it already failed the exec, why the timeout when
> >> we are
> >> already in an error state because we returned from an exec.
> >>
> >> However:
> >> The function is called with task_create, so not a fork.
> >> If I got it right then the task has separate filedescriptors as a
> >> forked process
> >> would have and exec closing the task copies of the sockets should be
> >> fine, as
> >> long as the use count on those are properly increased.
> >> But if it is intendend to behave like fork/exec, why does stuff
> >> happen after
> >> the exec?
> >>
> >>
> >> On Tue, Jul 01, 2025 at 03:00:23PM -0300, Alan C. Assis wrote:
> >>> Hi Tim,
> >>>
> >>> You are right, it doesn't execute, but some subprocess (like a CGI)
> >>> could
> >>> try to execute.
> >>>
> >>> This comment there shed some light about it:
> >>>
> >>> "I wouldn't describe O_CLOEXEC as there principally for privilege
> >>> escalation / security reasons -- it's also very,
> >>> very common to have non-security bugs happen (frequently of the
> >>> indefinite-blocking variety) if a FD is left
> >>> open beyond when it's intended to be closed because a subprocess
> >>> still has
> >>> it."
> >>>
> >>> So, why does removing SOCK_CLOEXEC make http work? If the fd is not
> >>> executed, the socket shouldn't be closed, right?
> >>>
> >>> And why was it working in the past? Which modification broke this?
> >>> Maybe understanding it is important to have the right fix (maybe
> >>> removing
> >>> it is acting as a band-aid).
> >>>
> >>> Wengzhe, could you please help us to understand this network issue?
> >>>
> >>> BR,
> >>>
> >>> Alan
> >>>
> >>> On Tue, Jul 1, 2025 at 12:28 PM Tim Hardisty<timhardist...@gmail.com>
> >>> wrote:
> >>>
> >>>> But that's the point - thttp *does* call exec() so the open socket
> >>>> file
> >>>> descriptor gets closed when it is still needed by the exec'd
> >>>> application.
> >>>>
> >>>> If there's another way of doing this I'm listening!
> >>>>
> >>>> On 01/07/2025 16:13, Alan C. Assis wrote:
> >>>>> Hi Tim,
> >>>>>
> >>>>> Nice finding!
> >>>>>
> >>>>> Now we need to understand why this worked in the past and now it
> >>>>> doesn't.
> >>>>>
> >>>>> Also, what are the implications of removing SOCK_CLOEXEC? A few
> >>>>> pointers
> >>>>> here:
> >>>>>
> >>>>
> https://stackoverflow.com/questions/22304631/what-is-the-purpose-to-set-sock-cloexec-flag-with-accept4-same-as-o-cloexec
> >>>>
> >>>>> BR,
> >>>>>
> >>>>> Alan
> >>>>>
> >>>>> On Tue, Jul 1, 2025 at 11:27 AM Tim Hardisty<timhardist...@gmail.com
> >
> >>>>> wrote:
> >>>>>
> >>>>>> The error was, indeed, the socket being opened with the SOCK_CLOEXEC
> >>>>>> flasg set.
> >>>>>>
> >>>>>> PR to follow.
> >>>>>>
> >>>>>> On 28/06/2025 16:16, Tim Hardisty wrote:
> >>>>>>> Actually - it might be a change last year. The socket is now opened
> >>>>>>> like this and I assume CLOEXEC will mess up the operation of the
> >>>>>>> executed CGI app (will investigate on Monday; not sure what socket
> >>>>>>> mode it needs to be):
> >>>>>>>
> >>>>>>> hc->conn_fd = accept4(listen_fd, (struct sockaddr *)&sa, &sz,
> >>>>>>> SOCK_CLOEXEC);
> >>>>>>>
> >>>>>>> On 28/06/2025 13:22, Alan C. Assis wrote:
> >>>>>>>> Hi Tim,
> >>>>>>>>
> >>>>>>>> Yes, I think send() is the preferred form to work with sockets
> >>>>>>>> because you
> >>>>>>>> can have fine control, i.e. passing flags at forth argument
> >>>>>>>> (MSG_DONTWAIT,
> >>>>>>>> etc).
> >>>>>>>>
> >>>>>>>> If you suspect that the bug was caused by some recent
> >>>>>>>> modification,
> >>>>>>>> try to
> >>>>>>>> find a supported board that was used to test thttpd in the past
> >>>>>>>> and
> >>>>>>>> test an
> >>>>>>>> old NuttX release with it.
> >>>>>>>> This is the approach I use to double check if something is
> >>>>>>>> broken in
> >>>> the
> >>>>>>>> mainline.
> >>>>>>>>
> >>>>>>>> BR,
> >>>>>>>>
> >>>>>>>> Alan
> >>>>>>>>
> >>>>>>>> On Fri, Jun 27, 2025 at 3:39 PM Tim Hardisty
> >>>>>>>> <timhardist...@gmail.com
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Is it as "simple" as thttpd should do:
> >>>>>>>>>
> >>>>>>>>> nwritten= send(sock_fd, buffer, totalbytesread, 0);
> >>>>>>>>>
> >>>>>>>>> rather than the generic:
> >>>>>>>>>
> >>>>>>>>> nwritten= write(sock_fd, buffer, nbytes);
> >>>>>>>>>
> >>>>>>>>> On 27/06/2025 18:40, Tim Hardisty wrote:
> >>>>>>>>>> Trying to get thttpd's CGI handling working and have found
> >>>>>>>>>> that the
> >>>>>>>>>> dup(2) calls of stdin and stdout return a file descriptor that's
> >>>>>>>>>> already been allocated to the NET socket (via thttpd I think).
> >>>>>>>>>>
> >>>>>>>>>> That isn't right is it?
> >>>>>>>>>>
> >>>>>>>>>> I am not sure if it's a side effect of something that thttpd
> >>>>>>>>>> does
> >>>>>>>>>> (that might have been OK in the past but is now not right) or a
> >>>> NuttX
> >>>>>>>>>> bug, of a missing Kconfig setting that relates to this.
> >>>>>>>>>>
> >>>>>>>>>> The result is that the ultimate copying of buffered html that
> >>>>>>>>>> should
> >>>>>>>>>> be written via the socket FD gets rejected as the FD doesn't
> >>>>>>>>>> have WR
> >>>>>>>>>> access (and is now the wrong FD anyway!).
> >>>>>>>>>>
> >>>>>>>>>> Perhaps there's been a change in the way NuttX deals with all of
> >>>> this
> >>>>>>>>>> that didn't get sorted in thttpd?
> >>>>>>>>>>
>

Re: dup2() uses FD already allocated to NET skt

Reply via email to