A structure occasionally had an uninitialized boolean value that was 
directly set into the reply message.  Unspecified Behavior Sanitizer 
(libubsan) found it for us.

On Wednesday, March 24, 2021 at 2:23:04 PM UTC-4 [email protected] wrote:

> The deserialization happens at the surface layer instead of the transport 
> layer, unless we suspect that HTTP/2 frames themselves were malformed. If 
> we suspect the serialization/deserialization code, we can check if simply 
> serializing the proto to bytes and back is causing issues. Protobuf has 
> utility functions to do this. Alternatively, gRPC has utility functions 
> here 
> https://github.com/grpc/grpc/blob/master/include/grpcpp/impl/codegen/proto_utils.h
>
> I am worried for memory corruption though so that is certainly something 
> to check.
>
>
> On Wednesday, March 24, 2021 at 11:02:30 AM UTC-7 Bryan Schwerer wrote:
>
>> Thanks for replying.
>>
>> I was able to get a tcpdump capture and run it through the wireshark 
>> disector.  It indicated that there were malformed protobuf fields in the 
>> message.  I'm guessing the client threw the messages away.   I didn't see a 
>> trace message indicating that.  Is there some sort of stat I can check?  
>> Would it be possible that older versions didn't discard malformed message?  
>> I haven't loaded up an old version of our code, but I suspect it has always 
>> been there.  The end of the message has counters and such that if they were 
>> a bit off, no one would notice.
>>
>> I think we are corrupting the messages on the server side,  I turned on 
>> -fstack-protector-all and the problem went away.  If there's a possible way 
>> to check the message before sending to Writer, that may give us more 
>> information.  We don't use arenas.  The message itself is uint32's, bool's 
>> and one string.  I assume protobufs makes a copy of the string and not the 
>> pointer to the buffer.
>>
>> On Wednesday, March 24, 2021 at 1:35:29 PM UTC-4 [email protected] wrote:
>>
>>> This is pretty strange. It is possible that we are being blocked on flow 
>>> control. I would check that we are making sure that the application layer 
>>> is reading. If I am not mistaken, `perform_stream_op[s=0x7f0e16937290]:  
>>> RECV_MESSAGE` is a log that is seen at the start of an operation meaning 
>>> that the HTTP/2 layer hasn't yet been instructed to read a message, (or 
>>> there is a previous read on the stream already that hasn't finished). Given 
>>> that you are just updating the gRPC version from 1.20 to 1.36.1, I do not 
>>> have an answer as to why you would see this without any application 
>>> changes. 
>>>
>>> A few questions - 
>>> Do the two streams use the same underlying channel/transport?
>>> Are the clients and the server in the same process?
>>> Is there anything special about the environment this is being run in?
>>>
>>> (One way to make sure that the read op is being propagated to the 
>>> transport layer, is to check the logs with the "channel" tracer.)
>>> On Friday, March 19, 2021 at 12:59:30 PM UTC-7 Bryan Schwerer wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm in the long overdo process of updating gRPC from 1.20 to 1.36.1.  I 
>>>> am running into an issue where the streaming replies from the server are 
>>>> not reaching the client in about 50% of the instances.  This is binary, 
>>>> either the streaming call works perfectly or it doesn't work at all.  
>>>> After 
>>>> debugging a bit, I turned on the http tracing and from what I can tell, 
>>>> the 
>>>> http messages are received in the client thread, but where in the correct 
>>>> case, perform_stream_op[s=0x7f0e16937290]:  RECV_MESSAGE is logged, but in 
>>>> the broken case it isn't.  No error messages occur.
>>>>
>>>> I've tried various tracers, but haven't hit anything.  The code is 
>>>> pretty much the same pattern as the example and there's no indication any 
>>>> disconnect has occurred which would cause the call to terminate.  Using 
>>>> gdb 
>>>> to look at the thread, it is still in epoll_wait.
>>>>
>>>> The process in which this runs calls 2 different synchronous server 
>>>> streaming calls to the same server in separate threads.  It also is a gRPC 
>>>> server.  Everything is run over the internal 'lo' interface.  Any ideas on 
>>>> where to look to debug this?
>>>>
>>>> Thanks,
>>>>
>>>> Bryan
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/650d981a-fc7d-4706-aaeb-e04fbfff0949n%40googlegroups.com.

Reply via email to