[
https://issues.apache.org/jira/browse/THRIFT-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James E. King III updated THRIFT-4591:
--------------------------------------
Description:
1) realize thrift server with TNonblockingServer via c++;
2) realize thrift client via lua lib and choose frame transport.
3) call remote interface failed with "TTransportException:0: Default (unknown)"
print, and the server show "TConnection::workSocket(): THRIFT_EAGAIN
(unavailable resources)" error.
4)investigate this fault with tcpdump tool, attachment 9090.pcap show the frame
msg doesnot contains frame size field, the rifht situation of attachment
9090_1.pcap show the frame msg contains 4 bytes (00 00 00 25) before protocol
id field.
5) dig into the fault and tried to find root cause, then i found there is an
fault in TFramedTransport:flush function in TFramedTransport.lua file. the
original realization is:
-----
function TFramedTransport:flush()
if self.doWrite == false then
return self.trans:flush()
end
-- If the write fails we still want wBuf to be clear
local tmp = self.wBuf
self.wBuf = ''
local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
self.trans:write(frame_len_buf)
self.trans:write(tmp)
self.trans:flush()
end
-----
which send frame size file and reset msg content independently.
was:
(jking): C++ TFramedTransport reads the frame size then attempts to read the
message. If it only gets part of the message it returns the partial read, and
the upper layer will not be able to decode the message, further read may be
called again, when it will go and try to read a frame size again, but it could
be in the middle of message payload the underlying transport hadn't yet
received. It's amazing to see this in code that's been around so long!
Original Bug report:
1) realize thrift server with TNonblockingServer via c++;
2) realize thrift client via lua lib and choose frame transport.
3) call remote interface failed with "TTransportException:0: Default (unknown)"
print, and the server show "TConnection::workSocket(): THRIFT_EAGAIN
(unavailable resources)" error.
4)investigate this fault with tcpdump tool, attachment 9090.pcap show the frame
msg doesnot contains frame size field, the rifht situation of attachment
9090_1.pcap show the frame msg contains 4 bytes (00 00 00 25) before protocol
id field.
5) dig into the fault and tried to find root cause, then i found there is an
fault in TFramedTransport:flush function in TFramedTransport.lua file. the
original realization is:
-----
function TFramedTransport:flush()
if self.doWrite == false then
return self.trans:flush()
end
-- If the write fails we still want wBuf to be clear
local tmp = self.wBuf
self.wBuf = ''
local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
self.trans:write(frame_len_buf)
self.trans:write(tmp)
self.trans:flush()
end
-----
which send frame size file and reset msg content independently.
----------------------
(jking) Analysis of original report: it fixes the sender to send once, but it
shouldn't matter if the size is sent separately from the payload. It's the
receiver where the root cause is, in this case the C++ library. This issue may
not be limited to the C++ implementation, but we need a test to insert a pause
between sending a frame size and sending the payload and see what happens on
all the implementations.
We're not going to merge the lua client fix as it doubles the memory
requirements to send, despite reducing the write() count from 2 to 1.
> Incompatibility using non-blocking server and frame transport on C++ side?
> --------------------------------------------------------------------------
>
> Key: THRIFT-4591
> URL: https://issues.apache.org/jira/browse/THRIFT-4591
> Project: Thrift
> Issue Type: Bug
> Components: C++ - Library
> Affects Versions: 0.11.0
> Reporter: allen_lee
> Assignee: James E. King III
> Priority: Blocker
> Attachments: 9090.pcap, 9090_1.pcap
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> 1) realize thrift server with TNonblockingServer via c++;
> 2) realize thrift client via lua lib and choose frame transport.
> 3) call remote interface failed with "TTransportException:0: Default
> (unknown)" print, and the server show "TConnection::workSocket():
> THRIFT_EAGAIN (unavailable resources)" error.
> 4)investigate this fault with tcpdump tool, attachment 9090.pcap show the
> frame msg doesnot contains frame size field, the rifht situation of
> attachment 9090_1.pcap show the frame msg contains 4 bytes (00 00 00 25)
> before protocol id field.
> 5) dig into the fault and tried to find root cause, then i found there is an
> fault in TFramedTransport:flush function in TFramedTransport.lua file. the
> original realization is:
> -----
> function TFramedTransport:flush()
> if self.doWrite == false then
> return self.trans:flush()
> end
> -- If the write fails we still want wBuf to be clear
> local tmp = self.wBuf
> self.wBuf = ''
> local frame_len_buf = libluabpack.bpack("i", string.len(tmp))
> self.trans:write(frame_len_buf)
> self.trans:write(tmp)
> self.trans:flush()
> end
> -----
> which send frame size file and reset msg content independently.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)