[
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Knupp updated IMPALA-9489:
--------------------------------
Description:
[Note: this JIRA was filed in relation to the ongoing effort to make the
impala-shell compatible with python 3]
The impala python development environment is a fairly convoluted affair -- a
number of packages are installed in the infra/python/env, some of it comes from
the toolchain, some of it is generated and lives in the shell directory.
Generally speaking, if you launch impala-python and import a module, it's not
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl
<module 'sasl' from
'/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
>>> import requests
>>> requests
<module 'requests' from
'/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
>>> import Logging
>>> Logging
<module 'Logging' from '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
>>> import thrift
>>> thrift
<module 'thrift' from
'/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
{noformat}
Really, there is no one coherent environment -- there's just whatever
collection of modules happens to be available at a given time for a given type
of invocation, all of which is accomplished behind the scenes by calling
scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that
are responsible for cobbling together a PYTHONPATH based on known locations and
current env variables.
As far as I can tell, there are three important contexts where python comes
into play...
* during the build process (used during data load, e.g.,
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked
As noted by IMPALA-7825 (and also in a conversation I had with
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It
seems to happen during test data load (specifically, when calling
testdata/bin/load_nested.py) mainly because there was some well-intentioned but
probably misjudged attempt at code reuse from the test framework. The test code
that gets re-used involves impyla and/or thrift-sasl, which currently still
relies on thrift 0.9.3. So our test framework, and by extension the build, both
inherit the same limitation.
The impala-shell, on the other hand, luckily doesn't directly reuse any of the
same modules, and there's no real need to keep it pinned to 0.9.3. However,
since calling the impala-shell.sh winds up invoking {{set-pythonpath.sh}}, the
same script that script sets up the environment during building or testing, the
shell winds up defaulting to thrift 0.9.3 as well.
thrift 0.9.3 is one of the many limitations restricting the impala-shell to
python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available -- we
just have to use it. The way to accomplish this is be decoupling the
impala-shell from calling either {{set-pythonpath.sh}} or
{{impala-python-common.sh}}.
was:
[Note: this JIRA was filed in relation to the ongoing effort to make the
impala-shell compatible with python 3]
The impala python development environment is a fairly convoluted affair -- a
number of packages are installed in the infra/python/env, some of it comes from
the toolchain, some of it is generated and lives in the shell directory.
Generally speaking, if you launch impala-python and import a module, it's not
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl
<module 'sasl' from
'/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
>>> import requests
>>> requests
<module 'requests' from
'/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
>>> import Logging
>>> Logging
<module 'Logging' from '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
>>> import thrift
>>> thrift
<module 'thrift' from
'/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
{noformat}
Really, there is no one coherent environment -- there's just whatever
collection of modules happens to be available at a given time for a given type
of invocation, all of which is accomplished behind the scenes by scripts like
{{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that cobble
together a PYTHONPATH based on known locations and current env variables.
As far as I can tell, there are three important contexts where python comes
into play...
* during the build process (used during data load, e.g.,
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked
As noted by IMPALA-7825 (and also in a conversation I had with
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It
seems to happen during test data load (specifically, when calling
testdata/bin/load_nested.py) mainly because there was some well-intentioned but
probably misjudged attempt at code reuse from the test framework. The test code
that gets re-used involves impyla and/or thrift-sasl, which currently still
relies on thrift 0.9.3. So our test framework, and by extension the build, both
inherit the same limitation.
The impala-shell, on the other hand, luckily doesn't directly reuse any of the
same modules, and there's no real need to keep it pinned to 0.9.3. However,
since calling the impala-shell.sh winds up invoking {{set-pythonpath.sh}}, the
same script that script sets up the environment during building or testing, the
shell winds up defaulting to thrift 0.9.3 as well.
thrift 0.9.3 is one of the many limitations restricting the impala-shell to
python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available -- we
just have to use it. The way to accomplish this is be decoupling the
impala-shell from calling either {{set-pythonpath.sh}} or
{{impala-python-common.sh}}.
> Setup impala-shell.sh env separately, and use thrift-0.11.0 by default
> ----------------------------------------------------------------------
>
> Key: IMPALA-9489
> URL: https://issues.apache.org/jira/browse/IMPALA-9489
> Project: IMPALA
> Issue Type: Improvement
> Components: Infrastructure
> Affects Versions: Impala 3.4.0
> Reporter: David Knupp
> Assignee: David Knupp
> Priority: Major
>
> [Note: this JIRA was filed in relation to the ongoing effort to make the
> impala-shell compatible with python 3]
> The impala python development environment is a fairly convoluted affair -- a
> number of packages are installed in the infra/python/env, some of it comes
> from the toolchain, some of it is generated and lives in the shell directory.
> Generally speaking, if you launch impala-python and import a module, it's not
> necessarily easy to predict where the module might live.
> {noformat}
> $ python
> Python 2.7.10 (default, Aug 17 2018, 19:45:58)
> [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sasl
> >>> sasl
> <module 'sasl' from
> '/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
> >>> import requests
> >>> requests
> <module 'requests' from
> '/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
> >>> import Logging
> >>> Logging
> <module 'Logging' from
> '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
> >>> import thrift
> >>> thrift
> <module 'thrift' from
> '/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
> {noformat}
> Really, there is no one coherent environment -- there's just whatever
> collection of modules happens to be available at a given time for a given
> type of invocation, all of which is accomplished behind the scenes by calling
> scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}}
> that are responsible for cobbling together a PYTHONPATH based on known
> locations and current env variables.
> As far as I can tell, there are three important contexts where python comes
> into play...
> * during the build process (used during data load, e.g.,
> testdata/bin/load_nested.py)
> * when running the py.test bases e2e tests
> * whenever the impala-shell is invoked
> As noted by IMPALA-7825 (and also in a conversation I had with
> [~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process.
> It seems to happen during test data load (specifically, when calling
> testdata/bin/load_nested.py) mainly because there was some well-intentioned
> but probably misjudged attempt at code reuse from the test framework. The
> test code that gets re-used involves impyla and/or thrift-sasl, which
> currently still relies on thrift 0.9.3. So our test framework, and by
> extension the build, both inherit the same limitation.
> The impala-shell, on the other hand, luckily doesn't directly reuse any of
> the same modules, and there's no real need to keep it pinned to 0.9.3.
> However, since calling the impala-shell.sh winds up invoking
> {{set-pythonpath.sh}}, the same script that script sets up the environment
> during building or testing, the shell winds up defaulting to thrift 0.9.3 as
> well.
> thrift 0.9.3 is one of the many limitations restricting the impala-shell to
> python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available --
> we just have to use it. The way to accomplish this is be decoupling the
> impala-shell from calling either {{set-pythonpath.sh}} or
> {{impala-python-common.sh}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]