[
https://issues.apache.org/jira/browse/THRIFT-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868225#comment-16868225
]
aqingsir commented on THRIFT-4887:
----------------------------------
Totally agree with your comment #1: Never add/remove a field especially one
that is in the middle of a struct. And if there has to be such a case, make the
field to be the last one and marked as optional. This is what we advocate
within our workmates.
However there still exists such a risk, especially in large companies, and this
is why I open this issue.
For you comment of "the code should end up in {{Server.Execute()}} where any
exception is caught and the whole protocol/transport stack is reinitialized
from scratch, including closing sockets, reinstantiating a fresh framed
transport or whatever else there might be", my opinion is:
1. It's fine to reinitialize protocol/transport stack from scratch, but might
not close sockets as subsequent requests(maybe another thrift method that works
fine) could re-use them. Opening a lot sockets consume a lot resources and
could easily lead to Socket Exceptions.
2. If this is an issue, it exist in both client side and server side, my fix is
in client side. I need to dive more into source code to see how to fix it in
server side, probably Server.Execute() as you suggest.
> Thrift will OOM at a low concurrency if fields added and old client requests
> new server
> ---------------------------------------------------------------------------------------
>
> Key: THRIFT-4887
> URL: https://issues.apache.org/jira/browse/THRIFT-4887
> Project: Thrift
> Issue Type: Bug
> Environment: Almost all versions from 0.8.0 to the newest
> 0.13.0-snapshot, and verify on 0.9.3/0.11.0.
> Almost all languages, and verified on Go/Java
> Reporter: aqingsir
> Priority: Major
> Attachments: readI32.jpg
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> (Could see more on [https://github.com/aqingsao/thrift-oom])
> //background
> A serious issue occured in our prod env and finally it came out to be the
> changement of some fields in an IDL file, old client still requested new
> server and crashed due to OOM.
> IDL changement could be stated as: Return value of the interface is a list,
> element of which is a struct object and has 5 fields. A new field is added to
> the middle of the struct.
> // to reproduce
> In this case a low concurency of 10 will reproduce this issue, you could find
> a demo project on: [https://github.com/aqingsao/thrift-oom]
> // reason
> Thrift tries to consume all data in inputstream by skipping fields that are
> redundant or have a type mismatch. But it won't consume subsequent fields if
> there's an exception.
> In such a case Thrift does nothing the underlying inputstream, so trouble
> comes to the next request who reuses this connection, as the cursor still
> points to some middle position of the inputstream.
> As Thrift always starts with a readI32() method for any response, which means
> the length of the method's name. Unbelievable the length could be as large as
> 184549632, which is about 176M. This explains why OOM occurs even at a
> concurrency of 10
> // how to fix
> Always clear inputstream in TSocket if there are any redundant data at the
> end of a method call.
> I'll submit a PR soon for Java version.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)