[grpc-io] Re: More granularity for errors generated by gRPC itself

Arpit Baldeva Fri, 04 May 2018 14:24:36 -0700

There are many scenarios when application would want to see custom error 
codes even if just for logging/better visibility. The pattern I use:


1. Stick a google::rpc::status in every response message.
2. Create a Message with embedded enums next to the Service definition in 
the proto file. Each enum is a specific error code that your service 
returns. 
3. If you have an application error, use the above enum to populate 
google::rpc::Status with the integer code and readable message string. You 
can even send a custom message as part of Any for more verbose logging. 

On client side, you can then cast the integer code to enum to check the 
error type so like
assertEquals("error code check", FooServiceError.Type.ERR_OK_VALUE, 
response.getStatus().getCode());

Grpc also generates helper method to make sure that int to enum cast is 
valid (at least in C++) so you'll need to use that. 

Little more work than ideal but gets the job done. 

On Wednesday, April 18, 2018 at 1:46:12 PM UTC-7, Carl Mastrangelo wrote:
>
> Responses inline
>
> On Thursday, April 12, 2018 at 2:52:10 PM UTC-7, Ruslan Nigmatullin wrote:
>>
>> Hi,
>>
>> Is there a chance to add details to errors generated by gRPC layer itself 
>> to distinguish different scenarios instead of forcing gRPC users to analyze 
>> error message client-side? Parsing error messages is error-prone as they 
>> are not standardized across different languages and can change over time 
>> without any warning (and strictly speaking are not a part of api).
>>
>> Few examples we'd like to differentiate:
>> 1.1. ResourceExhausted returned by python server in case of exceeding 
>> concurrency limit - it's safe to retry request to different server 
>> (server is known to not starting processing the request) and is quite 
>> common for python setups
>>
>
> Retriable RPCs 
> <https://github.com/grpc/proposal/blob/master/A6-client-retries.md> which 
> are in the process of being implemented will allow you to specify if an 
> error code can be retried.  The call will be retried on the next available 
> transport.   If you are using the round robin load balancer this will pick 
> the next server.   Python is doing something nonstandard here, as other 
> languages don't have a global maximum on the number of RPCs.  
>  
>
>> 1.2. ResourceExhausted returned by any server in case of too big 
>> metadata/message - it's useless to retry it as message size doesn't change
>>
>
> These are not generally retriable anyways.  What would you do in response 
> to getting this code?
>  
>
>> 1.3. ResourceExhausted returned by any server in case of too big response 
>> - it's dangerous to retry non-idempotent request as server already 
>> processed it once
>>
>
> Again, what would you do about this?  The Status codes used by gRPC is 
> intended to be handled *automatically*.  I don't think there is any 
> automatic action you can take here, so why distinguish?
>
>  
>
>> 2.1. Unavailable returned by application logic to indicate some 
>> dependency being down, it can or can not be safe to retry depending on the 
>> specific scenario
>>
>
> This is something that applications are expected to indicate.  in gRPC, 
> you can add additional headers to RPCs.  One header you could add is an 
> app-specific sub error code, such as dependency_is_down.   You can include 
> additional detail about which dependency and things like that.  gRPC gives 
> you the tools to do this since it varies per instance.
>  
>
>> 2.2. Unavailable returned by client to indicate that all connections are 
>> down, it's safe to retry with a hope that new connection becomes established
>>
>
> This is not safe to retry.  The client would just spin trying to send an 
> RPC and failing repeatedly.  There is a special option in gRPC called "wait 
> for ready" which allows you to say an RPC should wait until there is an 
> available transport.   The opposite means fail immediately if no transport 
> could be used.  (to be clear, this fails when all transports are failing, 
> not when there are no transports.  This allows gRPC to establish a new 
> connection for the very first RPC without failing.)  
>  
>
>> 2.3. Unavailable returned by client to indicate that current active 
>> stream was terminated
>>
>
> If this isn't an error condition, why not just use Status OK ?  I assume 
> you mean graceful termination.
>
>  
>
>>
>> We're interested in having individual grpc-transport-specific error codes 
>> for individual cases for better attribution of failure scenarios (in 
>> metrics/tracing) to improve system's visibility and in some cases for 
>> reacting on them differently.
>> While some of this cases can be mitigated by ensuring that we always 
>> properly attribute our own application-specific errors, grpc-own errors are 
>> still indistinguishable in some scenarios [1].
>>
>> 1. https://github.com/grpc/grpc/blob/master/doc/statuscodes.md
>>
>> Thanks
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/97972c11-9e55-491b-aad4-3dfddf863069%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[grpc-io] Re: More granularity for errors generated by gRPC itself

Reply via email to