Quanlong Huang created THRIFT-5303:
--------------------------------------

             Summary: Unicode decode errors in _fast_decode
                 Key: THRIFT-5303
                 URL: https://issues.apache.org/jira/browse/THRIFT-5303
             Project: Thrift
          Issue Type: Bug
          Components: Python - Library
    Affects Versions: 0.11.0
         Environment: Ubuntu 16.04.6 LTS
            Reporter: Quanlong Huang


Impala currently uses thrift-0.11.0 on client side and thrift-0.9.3 on server 
side (server side upgrade is blocked by some issues). We encountered an issue 
in decoding utf8 bytes on the client side. The result has a partial utf8 code 
point. But thrift is not handling the error elegantly. The stacktrace:
{code:java}
Traceback (most recent call last):
  File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1210, in 
_do_beeswax_rpc
    ret = rpc()
  File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1113, in 
<lambda>
    self.fetch_size))
  File 
"/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py",
 line 254, in fetch
    return self.recv_fetch()
  File 
"/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py",
 line 275, in recv_fetch
    result.read(iprot)
  File 
"/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py",
 line 1410, in read
    iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe6 in position 3: 
unexpected end of data {code}
This is similar to THRIFT-2087, but the error happens in the boundary between 
Python and C++ codes. Just like THRIFT-2087, we need to provide an error 
handling behavior of decoding utf-8 bytes in 
{{TBinaryProtocolAccelerated._fast_decode}}. The related codes are 
[https://github.com/apache/thrift/blob/0.11.0/lib/py/src/ext/protocol.tcc#L708]
{code:c++}
  case T_STRING: {
    char* buf = NULL;
    int len = impl()->readString(&buf);
    if (len < 0) {
      return NULL;
    }
    if (isUtf8(typeargs)) {
      return PyUnicode_DecodeUTF8(buf, len, 0);  <--- Needs to provide an error 
handling method here
    } else {
      return PyBytes_FromStringAndSize(buf, len);
    }
  }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to