Hallo Joe,

try to reduce the problem.

Make the call to your SFTP-Service via curl or some other http(s) client to see whether you get the same timeout. If yes, than the server side is closing the connection. If not then you have to investigate the LWP::UserAgent part.

Another hint in combination with SSL: https://stackoverflow.com/questions/9400068/make-timeout-work-for-lwpuseragent-https

Best regards
Andreas


Am 13.05.2025 um 17:22 schrieb Joseph He:
Andreas, thank you.

On the client side, I set the timeout at LWP::UserAgent request to 600, and I can verify that it indeed works on my QA and DEV environment. If I change it to 120, then it can timeout at 120. So on my production server, the client side receives a timeout from the server after 5 minutes, so I still think the server Timeout plays a role here. I just don't know what config I can change to test it out.

Joe

On Tue, May 13, 2025 at 10:07 AM Andreas Mock <andreas.m...@web.de> wrote:

    Hi Joe,

    when you send a request via LWP::UserAgent to the Server which
    does the long lasting SFTP calls, then I'm pretty sure that you
    get a timout in the LWP::UserAgent code.

    I'm pretty sure the client (LWP::UserAgent) is not waiting long
    enough for the answer: https://metacpan.org/pod/LWP::UserAgent#timeout

    After having here a long timeout you have to be sure that the very
    first client which sent the very first request also waits long
    enough to let the application server make severals tries,
    therefore n * timeout.

    Best wishes
    Andreas


    Am 13.05.2025 um 16:46 schrieb Joseph He:
    Many thanks to you all.

    I am still trying to figure out the issue. Let me re-explain the
    problem I experienced with some details.

    The environment is Ubuntu 22.04, Apache2, ModPerl.
    I run a Http::request with LWP::UserAgent, the server receives
    the request and starts to process it.
    But it takes much longer due to a stalled SFTP call to the remote
    server, the Apache server timeout and sends back failure,
    meanwhile,*the server actually is still trying to process this
    request*.
    On the calling side, after receiving the failure status, it
    initiates another http::request and the load balancer redirects
    this call to another server for processing.
    It turns out this same http::request is processed twice.

    On my production server the timeout happens at 300 seconds mark.
    On my QA and Dev server, the timeout happens at 600 seconds. I
    have not changed anything on my production server yet.
    But on my QA and DEV servers, I have tried to change Timeout in
    apache2.conf, have tried to add Timeout to the virtualhost
    config, also have tried to add SetPerlEnv MOD_PERL_TIMEOUT to the
    virtualhost config, none of them change the timeout behavior of
    my QA and DEV servers.

    So what exactly controls the Timeout? I am totally lost.

    Cheers,
    Joe


    On Wed, Apr 23, 2025 at 5:17 PM Mithun Bhattacharya
    <mit...@gmail.com> wrote:

        Okay agreed that is a valid time out basically it is saying
        that a client has established tcp/ip connection but has not
        put its request either a get put or a post

        On Wed, Apr 23, 2025, 3:38 PM Joseph He
        <joseph.he.2...@gmail.com> wrote:

            On Apache2 doc, I found this. How does this timeout work?
            It looks like it can only wait for 300 seconds before
            failing a request.

            https://httpd.apache.org/docs/2.0/mod/core.html#timeout
            Description:
            
<https://httpd.apache.org/docs/2.0/mod/directive-dict.html#Description>
                Amount of time the server will wait for certain events
            before failing a request
            Syntax:
            <https://httpd.apache.org/docs/2.0/mod/directive-dict.html#Syntax>
                |TimeOut seconds|
            Default:
            <https://httpd.apache.org/docs/2.0/mod/directive-dict.html#Default>
                |TimeOut 300|
            Context:
            <https://httpd.apache.org/docs/2.0/mod/directive-dict.html#Context>
                server config, virtual host
            Status:
            <https://httpd.apache.org/docs/2.0/mod/directive-dict.html#Status>
                Core
            Module:
            <https://httpd.apache.org/docs/2.0/mod/directive-dict.html#Module>
                core

            The |TimeOut| directive currently defines the amount of
            time Apache will wait for three things:

             1. The total amount of time it takes to receive a GET
                request.
             2. The amount of time between receipt of TCP packets on
                a POST or PUT request.
             3. The amount of time between ACKs on transmissions of
                TCP packets in responses.

            We plan on making these separately configurable at some
            point down the road. The timer used to default to 1200
            before 1.2, but has been lowered to 300 which is still
            far more than necessary in most situations. It is not set
            any lower by default because there may still be odd
            places in the code where the timer is not reset when a
            packet is sent.


            On Wed, Apr 23, 2025 at 3:07 PM Mithun Bhattacharya
            <mit...@gmail.com> wrote:

                You configure timeout at the client side. Apache is
                at the server side. Server doesn't have a concept of
                time it could take days to run and not care.

                mod_perl code is where you are sending the http
                return status to make sure the client doesn't timeout
                waiting for the server to respond.


                On Wed, Apr 23, 2025, 2:19 PM Joseph He
                <joseph.he.2...@gmail.com> wrote:

                    Thanks, all.
                    Is that Apache timeout controlled by its
                    configuration "Timeout"?
                    I don't think it has anything to do with modPerl.
                    Am I missing something?
                    Thanks.

                    On Wed, Apr 23, 2025 at 1:41 PM Mithun
                    Bhattacharya <mit...@gmail.com> wrote:

                        Timeout happens because of how we handle the
                        request. Timeout is basically no response
                        came back. Why that happens is because we
                        think we want to have a correct response.
                        Unfortunately for long running requests the
                        correct response shouldn't be via http
                        response code or we face situations like
                        this. Instead reply with a 200 OK immediately
                        and then provide correct status in the
                        message body. Once a response code/header has
                        been sent timeout won't trigger and you could
                        potentially hold the connection for hours
                        without a problem.

                        On Wed, Apr 23, 2025, 9:32 AM Andreas Mock
                        <andreas.m...@web.de> wrote:

                            Hi Joseph,

                            your description is very vague, so can
                            only answer on some assumptions:

                            It sounds like a timeout is fired somewhere.

                            Best advice in these situations: Log as
                            many steps as you can. Keep your
                            eyes open on TCP/IP and higher level
                            timeouts.

                            Declare only ONE instance responsible for
                            a retry: Either the app server
                            calling the dispatcher with several tries
                            or the dispatcher trying for
                            himself. Not both.

                            Best regards
                            Andreas


                            Am 23.04.2025 um 16:21 schrieb Joseph He:
                            > All, good day.
                            >
                            > Here is the issue I have.
                            > My entire application is running on
                            ModPerl/Apache environment.
                            > I send Http::Request with data load
                            from my App server to a dispatch
                            > server thru LWP::UserAgent, I set the
                            timeout 600 seconds.
                            >
                            > The dispatch server is supposed to
                            manipulate the data and send the
                            > data to an external SFTP server.
                            Because the SFTP can fail, it will
                            > keep trying up to 4 times with 30
                            seconds sleep in case that SFTP
                            > connection fails.
                            >
                            > Recently, I found that I uploaded the
                            file twice sometimes. I figured
                            > out the root cause is that my Dispatch
                            server returns 'failure' at 6
                            > minutes while it keeps trying to do the
                            SFTP. The App server
                            > received HTTP::Response with error
                            status so it issued another call to
                            > send data. It turns out I uploaded the
                            identified file twice.
                            >
                            > Anybody has this sort of experience?
                            Why does the dispatch server
                            > return 'error' while it still processes
                            the data?
                            >
                            > Thanks a lot,
                            > Joseph
                            >

Reply via email to