Kengo Seki created AIRFLOW-2514:
-----------------------------------
Summary: HiveServer2Hook doesn't work on Python2 due to thrift
version conflict
Key: AIRFLOW-2514
URL: https://issues.apache.org/jira/browse/AIRFLOW-2514
Project: Apache Airflow
Issue Type: Bug
Components: hive_hooks, hooks
Reporter: Kengo Seki
impyla on which HiveServer2Hook depends doesn't work with Thrift 0.10.0+ on
Python2. Example:
{code}
$ pip show thrift
Name: thrift
Version: 0.11.0
(snip)
$ ipython
(snip)
In [1]: from airflow.hooks.hive_hooks import HiveServer2Hook
In [2]: HiveServer2Hook().get_conn().cursor()
[2018-05-23 10:21:02,117] {base_hook.py:83} INFO - Using connection to:
localhost
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-f76a25f124cf> in <module>()
----> 1 HiveServer2Hook().get_conn().cursor()
(snip)
/home/sekikn/.virtualenvs/a/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.pyc
in write(self, oprot)
1067 def write(self, oprot):
1068 if oprot.__class__ == TBinaryProtocol.TBinaryProtocolAccelerated
and self.thrift_spec is not None and fastbinary is not None:
-> 1069 oprot.trans.write(fastbinary.encode_binary(self, (self.__class__,
self.thrift_spec)))
1070 return
1071 oprot.writeStructBegin('OpenSession_args')
TypeError: expecting list of size 2 for struct args
{code}
[This problem is already
reported|https://github.com/cloudera/impyla/issues/286] and therefore [impyla
pins Thrift version to
0.9.3|https://github.com/cloudera/impyla/commit/94a8eff9cda0cdb16b180c7079961449c8385997].
On the other hand, hmsclient (introduced by AIRFLOW-2336) needs Thrift 0.11.0+.
With the lower version, importing hmsclient fails as follows:
{code}
$ pip show thrift
Name: thrift
Version: 0.10.0
(snip)
$ python -m airflow.hooks.hive_hooks
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.py", line
33, in <module>
import hmsclient
File
"/home/sekikn/.virtualenvs/a/local/lib/python2.7/site-packages/hmsclient/__init__.py",
line 2, in <module>
from .hmsclient import HMSClient
File
"/home/sekikn/.virtualenvs/a/local/lib/python2.7/site-packages/hmsclient/hmsclient.py",
line 23, in <module>
from .genthrift.hive_metastore import ThriftHiveMetastore
File
"/home/sekikn/.virtualenvs/a/local/lib/python2.7/site-packages/hmsclient/genthrift/hive_metastore/ThriftHiveMetastore.py",
line 11, in <module>
from thrift.TRecursive import fix_spec
ImportError: No module named TRecursive
{code}
As a result, HiveServer2Hook is not available on Python2 now.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)