Thanks for replying.

I was able to get a tcpdump capture and run it through the wireshark 
disector.  It indicated that there were malformed protobuf fields in the 
message.  I'm guessing the client threw the messages away.   I didn't see a 
trace message indicating that.  Is there some sort of stat I can check?  
Would it be possible that older versions didn't discard malformed message?  
I haven't loaded up an old version of our code, but I suspect it has always 
been there.  The end of the message has counters and such that if they were 
a bit off, no one would notice.

I think we are corrupting the messages on the server side,  I turned on 
-fstack-protector-all and the problem went away.  If there's a possible way 
to check the message before sending to Writer, that may give us more 
information.  We don't use arenas.  The message itself is uint32's, bool's 
and one string.  I assume protobufs makes a copy of the string and not the 
pointer to the buffer.

On Wednesday, March 24, 2021 at 1:35:29 PM UTC-4 [email protected] wrote:

> This is pretty strange. It is possible that we are being blocked on flow 
> control. I would check that we are making sure that the application layer 
> is reading. If I am not mistaken, `perform_stream_op[s=0x7f0e16937290]:  
> RECV_MESSAGE` is a log that is seen at the start of an operation meaning 
> that the HTTP/2 layer hasn't yet been instructed to read a message, (or 
> there is a previous read on the stream already that hasn't finished). Given 
> that you are just updating the gRPC version from 1.20 to 1.36.1, I do not 
> have an answer as to why you would see this without any application 
> changes. 
>
> A few questions - 
> Do the two streams use the same underlying channel/transport?
> Are the clients and the server in the same process?
> Is there anything special about the environment this is being run in?
>
> (One way to make sure that the read op is being propagated to the 
> transport layer, is to check the logs with the "channel" tracer.)
> On Friday, March 19, 2021 at 12:59:30 PM UTC-7 Bryan Schwerer wrote:
>
>> Hello,
>>
>> I'm in the long overdo process of updating gRPC from 1.20 to 1.36.1.  I 
>> am running into an issue where the streaming replies from the server are 
>> not reaching the client in about 50% of the instances.  This is binary, 
>> either the streaming call works perfectly or it doesn't work at all.  After 
>> debugging a bit, I turned on the http tracing and from what I can tell, the 
>> http messages are received in the client thread, but where in the correct 
>> case, perform_stream_op[s=0x7f0e16937290]:  RECV_MESSAGE is logged, but in 
>> the broken case it isn't.  No error messages occur.
>>
>> I've tried various tracers, but haven't hit anything.  The code is pretty 
>> much the same pattern as the example and there's no indication any 
>> disconnect has occurred which would cause the call to terminate.  Using gdb 
>> to look at the thread, it is still in epoll_wait.
>>
>> The process in which this runs calls 2 different synchronous server 
>> streaming calls to the same server in separate threads.  It also is a gRPC 
>> server.  Everything is run over the internal 'lo' interface.  Any ideas on 
>> where to look to debug this?
>>
>> Thanks,
>>
>> Bryan
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/de6dbb25-4c70-43b1-8dbc-6dd4d0c2bfb2n%40googlegroups.com.

Reply via email to