Quanlong Huang created IMPALA-13994:
---------------------------------------
Summary: Thrift client hive_client shouldn't be used in multiple
threads
Key: IMPALA-13994
URL: https://issues.apache.org/jira/browse/IMPALA-13994
Project: IMPALA
Issue Type: Bug
Components: Test
Reporter: Quanlong Huang
In ImpalaTestSuite, we create a ThriftHiveMetastore.Client as hive_client:
[https://github.com/apache/impala/blob/648209b17258cf610f4e73a3ed63de665216074f/tests/common/impala_test_suite.py#L255]
Different to other clients we create for Impala, this Thrift client is not
thread-safe and shouldn't be used in parallel tests. See THRIFT-2283 and this
email thread:
[https://lists.apache.org/thread/4rsjdtlpv8zrgknpf43vo5rg9q83b6wp]
{quote}The Thrift transport layer is not thread-safe. It is essentially a
wrapper on a socket. You can't interleave writing things to a single socket
from multiple threads without locking. You also don't know what order the
responses will come back in.
{quote}
Here are some exceptions I hit when using it in two threads in
https://gerrit.cloudera.org/c/22816/3:
{noformat}
Exception in thread Thread-4:
Traceback (most recent call last):
File
"/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/threading.py",
line 801, in __bootstrap_inner
self.run()
File
"/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/threading.py",
line 754, in run
self.__target(*self.__args, **self.__kwargs)
File
"/home/quanlong/workspace/Impala/tests/metadata/test_event_processing.py", line
636, in drop_table_in_hive
self.hive_client.drop_table(db, tbl_name, deleteData=True)
File
"/home/quanlong/workspace/Impala/shell/gen-py/impala_thrift_gen/hive_metastore/ThriftHiveMetastore.py",
line 3913, in drop_table
self.recv_drop_table()
File
"/home/quanlong/workspace/Impala/shell/gen-py/impala_thrift_gen/hive_metastore/ThriftHiveMetastore.py",
line 3937, in recv_drop_table
raise result.o1
NoSuchObjectException: NoSuchObjectException(message='null: null')
Exception in thread Thread-3:
Traceback (most recent call last):
File
"/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/threading.py",
line 801, in __bootstrap_inner
self.run()
File
"/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/threading.py",
line 754, in run
self.__target(*self.__args, **self.__kwargs)
File
"/home/quanlong/workspace/Impala/tests/metadata/test_event_processing.py", line
636, in drop_table_in_hive
self.hive_client.drop_table(db, tbl_name, deleteData=True)
File
"/home/quanlong/workspace/Impala/shell/gen-py/impala_thrift_gen/hive_metastore/ThriftHiveMetastore.py",
line 3913, in drop_table
self.recv_drop_table()
File
"/home/quanlong/workspace/Impala/shell/gen-py/impala_thrift_gen/hive_metastore/ThriftHiveMetastore.py",
line 3927, in recv_drop_table
(fname, mtype, rseqid) = iprot.readMessageBegin()
File
"/home/quanlong/workspace/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py",
line 134, in readMessageBegin
sz = self.readI32()
File
"/home/quanlong/workspace/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py",
line 217, in readI32
buff = self.trans.readAll(4)
File
"/home/quanlong/workspace/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TTransport.py",
line 62, in readAll
chunk = self.read(sz - have)
File
"/home/quanlong/workspace/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TTransport.py",
line 164, in read
self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
File
"/home/quanlong/workspace/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
line 164, in read
raise TTransportException(message="unexpected exception", inner=e)
TTransportException: unexpected exception {noformat}
ERRORs in HMS side indicating the request data is abnormal
{noformat}
2025-04-25T13:49:50,021 INFO [TThreadPoolServer WorkerProcess-188]
metastore.HiveMetaStore: 203: source:127.0.0.1 drop_table : tbl=null.null.null
2025-04-25T13:49:50,021 INFO [TThreadPoolServer WorkerProcess-188]
HiveMetaStore.audit: ugi=quanlong ip=127.0.0.1 cmd=source:127.0.0.1
drop_table : tbl=null.null.null
2025-04-25T13:49:50,022 WARN [TThreadPoolServer WorkerProcess-188]
metastore.ObjectStore: Falling back to ORM path due to direct SQL failure (this
is not an error): null at
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:393)
at
org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:896)
2025-04-25T13:49:50,022 ERROR [TThreadPoolServer WorkerProcess-188]
metastore.ObjectStore:
java.lang.NullPointerException: null
at
org.apache.hadoop.hive.metastore.utils.StringUtils.normalizeIdentifier(StringUtils.java:94)
~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
at
org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:853)
~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
at
org.apache.hadoop.hive.metastore.ObjectStore.getJDODatabase(ObjectStore.java:911)
~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
at
org.apache.hadoop.hive.metastore.ObjectStore$1.getJdoResult(ObjectStore.java:901)
~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
at
org.apache.hadoop.hive.metastore.ObjectStore$1.getJdoResult(ObjectStore.java:893)
~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
at
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:4302)
~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
at
org.apache.hadoop.hive.metastore.ObjectStore.getDatabaseInternal(ObjectStore.java:903)
~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
at
org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:875)
~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) ~[?:?]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_432]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_432]
at
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
at com.sun.proxy.$Proxy33.getDatabase(Unknown Source) ~[?:?]
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:3253)
~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]{noformat}
Another kind of ERROR log:
{noformat}
2025-04-25T13:49:50,054 ERROR [TThreadPoolServer WorkerProcess-188]
server.TThreadPoolServer: Thrift Error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in
readMessageBegin, old client?
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:254)
~[libthrift-0.16.0.jar:0.16.0]
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:76)
~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:250)
~[libthrift-0.16.0.jar:0.16.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_432]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_432]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_432]{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)