cshannon commented on PR #3134:
URL: https://github.com/apache/accumulo/pull/3134#issuecomment-1373735941

   @ctubbsii and @dlmarion - Ok so I finally figured out what happened. 
Everything was working with the maxFrameSize setting being configured on the 
transport factory until version 0.17.0. The test I just added passes fine 
without modification to TServerUtils with version 0.16.0. The reason it broke 
in 0.17.0 was due to the change that is part of 
https://github.com/apache/thrift/pull/2533 . Specifically this commit 
introduced the maxFrameSize option as part of the constructor 
https://github.com/apache/thrift/commit/66ac7b46fab85f175aec601cb48ea05408a1c186
 and is necessary to configure. 
   
   My fix works but this behavior change is obviously a breaking change and I'm 
not sure whether or not it's something that was intended or should maybe be 
addressed in Thrift itself at some point.
   
   ### TLDR version:
   
   My commit in this PR fixes the issue and it only applies to version 2.1.0 
since no other version uses Thrift 0.17.0. So this is a new regression and does 
not apply to previous versions, like 1.10. Also, when we configure the settings 
for our custom non blocking server we do still need to keep our existing 
maxFrameSize settings on both the transport factory configuration and the 
`maxReadBufferBytes` property on the options  as shown here otherwise it breaks 
further downstream, even if we set the constructor args. 
   
   
https://github.com/apache/accumulo/blob/706612f859d6e68891d487d624eda9ecf3fea7f9/server/base/src/main/java/org/apache/accumulo/server/rpc/TServerUtils.java#L274-L276
   
   ### Detailed Version:
   
   #### Why it breaks
   
   The root cause is the changes inside of `AbstractNonblockingServer`. This 
commit changed how max frame size is checked during reads, here is version 
[0.16.0](https://github.com/apache/thrift/blob/2a93df80f27739ccabb5b885cb12a8dc7595ecdf/lib/java/src/org/apache/thrift/server/AbstractNonblockingServer.java#L355).
 Previously, before 0.17.0, frame size was compared against 
`MAX_READ_BUFFER_BYTES` which comes from `maxReadBufferBytes` and is set by us 
inside 
[TServerUtils](https://github.com/apache/accumulo/blob/706612f859d6e68891d487d624eda9ecf3fea7f9/server/base/src/main/java/org/apache/accumulo/server/rpc/TServerUtils.java#L276)
   
   But in version 0.17.0, this has been refactored now and instead of reading 
the value from `MAX_READ_BUFFER_BYTES`, this is read from 
`trans_.getMaxFrameSize()` as shown here in version 
[0.17.0](https://github.com/apache/thrift/blob/4d493e867b349f3475203ef9848353b315203c51/lib/java/src/main/java/org/apache/thrift/server/AbstractNonblockingServer.java#L335).
 This value is 
[configured](https://github.com/apache/thrift/blob/4d493e867b349f3475203ef9848353b315203c51/lib/java/src/main/java/org/apache/thrift/transport/TNonblockingServerSocket.java#L137)
 by the new constructor argument which we were not setting so it was always the 
default value and caused the failure. 
   
   #### Why our existing config doesn't work
   
   Now, what confused me initially is that we also set the value on the 
configuration of the factory so I was unsure why there were two ways to 
configure the setting and why ours didn't work. It turns out the reason is they 
are different configuration objects used for different things. 
   
   The `maxFrameSize` that is now set by the constructor argument is applied to 
the `trans_` transport which is passed to `FrameBuffer` as a [constructor arg 
](https://github.com/apache/thrift/blob/4d493e867b349f3475203ef9848353b315203c51/lib/java/src/main/java/org/apache/thrift/server/AbstractNonblockingServer.java#L288)and
 configured back in `TNonblockingServerSocket` on line 137. 
   
   However, the `maxFrameSize` set on the configuration for the factory we 
create is applied to different configurations and transports as the factory 
that we configure doesn't get used to create `trans_` transport but is used to 
create the `inTrans_` and `outTrans_` transport objects in FrameBuffer as shown 
[here](https://github.com/apache/thrift/blob/4d493e867b349f3475203ef9848353b315203c51/lib/java/src/main/java/org/apache/thrift/server/AbstractNonblockingServer.java#L295-L296).
   
   So there are multiple transports configured in `FrameBuffer` and the 
constructor arg configuration applies to one of them and the factory 
configuration applies to the others.
   
   #### Why we need to keep existing config as well
    
   As I said above, we still need to keep the existing configuration on the 
transport factory. The new constructor setting will allow the read to get 
further but if we don't set the factory config then we get errors downstream 
[here](https://github.com/apache/thrift/blob/4d493e867b349f3475203ef9848353b315203c51/lib/java/src/main/java/org/apache/thrift/transport/layered/TFramedTransport.java#L141)
 when reading a frame.
   
   Furthermore, I think we need to keep setting the `maxFrameSize` 
configuration on `options.maxReadBufferBytes` because if the frame size is 
larger than that value then thrift just hangs and spins forever with no error 
(which is even worse). This happened when I was testing not setting it. This is 
because it's 
[waiting](https://github.com/apache/thrift/blob/2a93df80f27739ccabb5b885cb12a8dc7595ecdf/lib/java/src/org/apache/thrift/server/AbstractNonblockingServer.java#L363)
 for more memory to free up but it won't ever be free since the frame to be 
read is less than the configured max frame size so it tries to read it but it 
is larger than the max read bytes so it never finishes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to