capistrant edited a comment on pull request #10464: URL: https://github.com/apache/druid/pull/10464#issuecomment-721306987
> Hi @a2l007 and @capistrant, thanks for the nice PR and review! We usually treat silence as acceptance, so thanks for merging the PR too. However, for this particular PR, I think it needs to get "Design Review" which needs 3 +1s from committers (including the author himself) because it changes some user-facing behavior (https://github.com/apache/druid/blob/master/dev/committer-instructions.md). We have this policy because it's hard to undo once the changed user-facing behavior appears in some release. Apologies for the pre-mature merge. I will keep this in mind in future to make sure I loop in more committers during review process for something client facing like this. > When we do design review, we usually have one committer reviewing the whole PR line by line, so that others can focus on only the design part. From this aspect, I quickly reviewed only the design part. I have 3 comments so far. > > * The purpose of the PR sounds good to me because the timeout error reporting was inconsistent before. However this change is incompatible which means it can break user applications working based on the previous behavior. Someone can argue that no one could do it because the previous behavior was inconsistent. IMO, this change looks worth since the previous timeout error handling was easy to break, but should be called out in the release notes. Good point. How does something like this work for release notes? Does the patch author write up release notes for their change that the release manager will use later on? > * Looking at the code, [the status code is defined as 504](https://github.com/apache/druid/pull/10464/files#diff-4bc40182ca7938163c8f2b71355cb5849920526319bc56f83dd55e542f3a4323R42) which means gateway timeout. I'm wondering 408 request timeout is more appropriate in this case. Could you add more to your thoughts that 408 may be a better error code? I thought that 504 sounded ideal since the Broker API is kind of a proxy to the druid query engine. The engine did not produce a response in time for the Broker to return to client so it responds with 504. I also worry about the implications of using a 408 code in relation to the potential for middleware auto-retrying the request when the end client may not want that (something like a browser doing auto retries N times before actually surfacing that 408 to client if it continues). And I've always had the perception of 5XX being a server error and 4XX being more of a client error, like the client made a mistake. But in this case the server isn't coming up with a response in time. > * All query errors should be documented. Check out https://druid.apache.org/docs/latest/querying/querying.html#query-errors. shoot I had a feeling there was a doc like this out there, @a2l007 do you want to start a follow on PR for this? I suppose it may be a part of a larger follow on with whatever else we uncover in discussions with Jihoon. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
