[ 
https://issues.apache.org/jira/browse/KNOX-2095?focusedWorklogId=343897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-343897
 ]

ASF GitHub Bot logged work on KNOX-2095:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Nov/19 00:29
            Start Date: 15/Nov/19 00:29
    Worklog Time Spent: 10m 
      Work Description: shanyu commented on pull request #177: [WIP] KNOX-2095 
- Adding in DefaultDispatch code and tests to handle 504 errors
URL: https://github.com/apache/knox/pull/177#discussion_r346570126
 
 

 ##########
 File path: 
gateway-spi/src/main/java/org/apache/knox/gateway/dispatch/DefaultDispatch.java
 ##########
 @@ -138,6 +144,10 @@ protected HttpResponse executeOutboundRequest( 
HttpUriRequest outboundRequest )
         }
       }
       auditor.audit( Action.DISPATCH, outboundRequest.getURI().toString(), 
ResourceType.URI, ActionOutcome.SUCCESS, RES.responseStatus( statusCode ) );
+    } catch( SocketTimeoutException e ) {
+        //Set a 504 instead of throwing an IO exception in the event of a 
socket timeout,
+        //as an IO exception forces a 500 to be displayed.
 
 Review comment:
   @risdenk , for active/passive, it doesn't make sense to failover from active 
to passive in the case of socket timeout. For active/active, it may have value 
to failover. However, I have 2 arguments against this:
   1) Usually the client side that talks to knox has its own timeout (think 
about a Jupyter notebook), so letting service failover in the case of socket 
timeout multiple times would eventually cause client side timeout.
   2) Most likely the timeout is because the request itself took too much time 
exceeding the socket timeout config (considering Livy session creation), so 
failover to another server would also cause a socket timeout as well.
   
   And for active-active type of service, if really needed we can always retry 
at the client side. 
   
   I think letting client side know that the request caused timeout is very 
important, instead of Letting them face a generic 500 error without knowing 
what to do next.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 343897)
    Remaining Estimate: 165h 40m  (was: 165h 50m)
            Time Spent: 2h 20m  (was: 2h 10m)

> Many errors (E.G. 504s) being masked as 500 errors
> --------------------------------------------------
>
>                 Key: KNOX-2095
>                 URL: https://issues.apache.org/jira/browse/KNOX-2095
>             Project: Apache Knox
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: James Chen
>            Assignee: James Chen
>            Priority: Minor
>              Labels: easyfix
>             Fix For: 1.4.0
>
>         Attachments: KNOX-2095.patch, jamchen504patch.patch
>
>   Original Estimate: 168h
>          Time Spent: 2h 20m
>  Remaining Estimate: 165h 40m
>
> When errors occur while accessing the Knox gateway, errors are forcibly 
> overridden and represented as 500 errors, rather than whatever errors they 
> should be.
> For example, when the timeout value under gateway.httpclient.socketTimeout is 
> set to a very low timeout value (E.G. 1 ms) under gateway-site.xml, a socket 
> timeout exception is produced by the getHttpClient().execute( 
> outboundRequest) call. However, this is caught by the surrounding try-catch 
> block and thrown again as an IOException. This results in a generic 500 
> error, rather than a 504 error one would normally expect from this sort of 
> interaction.
>  
> For these sorts of scenarios, I believe it would be prudent to create a dummy 
> HttpResponse using a HttpResponseFactory object for the inboundResponse with 
> the corresponding error code (E.G. HttpStatus.SC_GATEWAY_TIMEOUT in the event 
> of a SocketTimeoutException) and return that instead to trigger the 
> appropriate 504 error. I suspect there are other sorts of potential error 
> code triggers that get this same IOException treatment that would be better 
> off receiving their own error codes.
>  
> Judging from the source code, this issue likely affects version 1.3.0, though 
> this has not been tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to