[
https://issues.apache.org/jira/browse/THRIFT-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James E. King III updated THRIFT-4591:
--------------------------------------
Fix Version/s: (was: 0.12.0)
> Incompatibility using non-blocking server and frame transport on C++ side?
> --------------------------------------------------------------------------
>
> Key: THRIFT-4591
> URL: https://issues.apache.org/jira/browse/THRIFT-4591
> Project: Thrift
> Issue Type: Bug
> Components: C++ - Library
> Affects Versions: 0.11.0
> Reporter: allen_lee
> Assignee: James E. King III
> Priority: Blocker
> Attachments: 9090.pcap, 9090_1.pcap
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> (jking): C++ TFramedTransport reads the frame size then attempts to read the
> message. If it only gets part of the message it returns the partial read,
> and the upper layer will not be able to decode the message, further read may
> be called again, when it will go and try to read a frame size again, but it
> could be in the middle of message payload the underlying transport hadn't yet
> received. It's amazing to see this in code that's been around so long!
> Original Bug report:
> 1) realize thrift server with TNonblockingServer via c++;
> 2) realize thrift client via lua lib and choose frame transport.
> 3) call remote interface failed with "TTransportException:0: Default
> (unknown)" print, and the server show "TConnection::workSocket():
> THRIFT_EAGAIN (unavailable resources)" error.
> 4)investigate this fault with tcpdump tool, attachment 9090.pcap show the
> frame msg doesnot contains frame size field, the rifht situation of
> attachment 9090_1.pcap show the frame msg contains 4 bytes (00 00 00 25)
> before protocol id field.
> 5) dig into the fault and tried to find root cause, then i found there is an
> fault in TFramedTransport:flush function in TFramedTransport.lua file. the
> original realization is:
> -----
> function TFramedTransport:flush()
> if self.doWrite == false then
> return self.trans:flush()
> end
> -- If the write fails we still want wBuf to be clear
> local tmp = self.wBuf
> self.wBuf = ''
> local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
> self.trans:write(frame_len_buf)
> self.trans:write(tmp)
> self.trans:flush()
> end
> -----
> which send frame size file and reset msg content independently.
> ----------------------
> (jking) Analysis of original report: it fixes the sender to send once, but it
> shouldn't matter if the size is sent separately from the payload. It's the
> receiver where the root cause is, in this case the C++ library. This issue
> may not be limited to the C++ implementation, but we need a test to insert a
> pause between sending a frame size and sending the payload and see what
> happens on all the implementations.
> We're not going to merge the lua client fix as it doubles the memory
> requirements to send, despite reducing the write() count from 2 to 1.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)