[ 
https://issues.apache.org/jira/browse/THRIFT-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868225#comment-16868225
 ] 

aqingsir commented on THRIFT-4887:
----------------------------------

Totally agree with your comment #1: Never add/remove a field especially one 
that is in the middle of a struct. And if there has to be such a case, make the 
field to be the last one and marked as optional. This is what we advocate 
within our workmates.

However there still exists such a risk, especially in large companies, and this 
is why I open this  issue.

For you comment of "the code should end up in {{Server.Execute()}} where any 
exception is caught and the whole protocol/transport stack is reinitialized 
from scratch, including closing sockets, reinstantiating a fresh framed 
transport or whatever else there might be", my opinion is:

1. It's fine to reinitialize protocol/transport stack from scratch, but might 
not close sockets as subsequent requests(maybe another thrift method that works 
fine) could re-use them. Opening a lot sockets consume a lot resources and 
could easily lead to Socket Exceptions.

2. If this is an issue, it exist in both client side and server side, my fix is 
in client side. I need to dive more into source code to see how to fix it in 
server side, probably Server.Execute() as you suggest.

 

> Thrift will OOM at a low concurrency if fields added and old client requests 
> new server
> ---------------------------------------------------------------------------------------
>
>                 Key: THRIFT-4887
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4887
>             Project: Thrift
>          Issue Type: Bug
>         Environment: Almost all versions from 0.8.0 to the newest 
> 0.13.0-snapshot, and verify on 0.9.3/0.11.0.
> Almost all languages, and verified on Go/Java
>            Reporter: aqingsir
>            Priority: Major
>         Attachments: readI32.jpg
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> (Could see  more on [https://github.com/aqingsao/thrift-oom])
> //background
> A serious issue occured in our prod env and finally it came out to be the 
> changement of some fields in an IDL file, old client still requested new 
> server and crashed due to OOM.
> IDL changement could be stated as: Return value of the interface is a list, 
> element of which is a struct object and has 5 fields. A new field is added to 
> the middle of the struct.
> // to reproduce
> In this case a low concurency of 10 will reproduce this issue, you could find 
> a demo project on: [https://github.com/aqingsao/thrift-oom]
> // reason
> Thrift tries to consume all data in inputstream by skipping fields that are 
> redundant or have a type mismatch. But it won't consume subsequent fields if 
> there's an exception.
> In such a case Thrift does nothing the underlying inputstream, so trouble 
> comes to the next request who reuses this connection, as the cursor still 
> points to some middle position of the inputstream.
> As Thrift always starts with a readI32() method for any response, which means 
> the length of the method's name. Unbelievable the length could be as large as 
> 184549632, which is about 176M. This explains why OOM occurs even at a 
> concurrency of 10
> // how to fix
> Always clear inputstream in TSocket if there are any redundant data at the 
> end of a method call.
> I'll submit a PR soon for Java version.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to