[
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Knupp updated IMPALA-9489:
--------------------------------
Description:
[Note: this JIRA was filed in relation to the ongoing effort to make the
impala-shell compatible with python 3]
The impala python development environment is a fairly convoluted affair -- a
number of packages are installed in the infra/python/env, some of it comes from
the toolchain, some of it is generated and lives in the shell directory.
Generally speaking, if you launch impala-python and import a module, it's not
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl
<module 'sasl' from
'/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
>>> import requests
>>> requests
<module 'requests' from
'/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
>>> import Logging
>>> Logging
<module 'Logging' from '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
>>> import thrift
>>> thrift
<module 'thrift' from
'/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
{noformat}
Really, there is no one coherent environment -- there's just whatever
collection of modules happens to be available at a given time for a given type
of invocation, all of which is accomplished behind the scenes by calling
scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that
are responsible for cobbling together a PYTHONPATH based on known locations and
current env variables.
As far as I can tell, there are three important contexts where python comes
into play...
* during the build process (used during data load, e.g.,
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked
As noted by IMPALA-7825 (and also in a conversation I had with
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 during the build
process. This seems to come into play during the loading of test data
(specifically, when calling testdata/bin/load_nested.py) mainly because at one
point there was some well-intentioned but probably misguided attempt at code
reuse from the test framework. The test code that gets re-used involves impyla
and/or thrift-sasl, which currently still relies on thrift 0.9.3. So our test
framework, and by extension the build, both inherit the same limitation.
The impala-shell, on the other hand, luckily doesn't directly reuse any of the
same test modules, and there really is no need to keep it pinned to 0.9.3.
However, since calling the impala-shell.sh winds up invoking
{{set-pythonpath.sh}}, the same script that script sets up the environment
during building or testing, thrift 0.9.3 just kind of leaks over by default.
As it turns out, thrift 0.9.3 is also one of the many limitations restricting
the impala-shell to python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0
is available -- we just have to use it. And the way to accomplish that is by
decoupling the impala-shell from relying either {{set-pythonpath.sh}} or
{{impala-python-common.sh}}.
As a first pass, we can address the dev environment by just having
{{impala-shell.sh}} itself do whatever is required to find python dependencies,
and we can specify thrift-0.11.0 there. Also, thrift 0.11.0 should be used by
both of the scripts used to create the tarballs that package the impala-shell
for customer environments. Neither of these should adversely building Impala or
running the py.test test framework.
was:
[Note: this JIRA was filed in relation to the ongoing effort to make the
impala-shell compatible with python 3]
The impala python development environment is a fairly convoluted affair -- a
number of packages are installed in the infra/python/env, some of it comes from
the toolchain, some of it is generated and lives in the shell directory.
Generally speaking, if you launch impala-python and import a module, it's not
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl
<module 'sasl' from
'/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
>>> import requests
>>> requests
<module 'requests' from
'/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
>>> import Logging
>>> Logging
<module 'Logging' from '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
>>> import thrift
>>> thrift
<module 'thrift' from
'/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
{noformat}
Really, there is no one coherent environment -- there's just whatever
collection of modules happens to be available at a given time for a given type
of invocation, all of which is accomplished behind the scenes by calling
scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that
are responsible for cobbling together a PYTHONPATH based on known locations and
current env variables.
As far as I can tell, there are three important contexts where python comes
into play...
* during the build process (used during data load, e.g.,
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked
As noted by IMPALA-7825 (and also in a conversation I had with
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 during the build
process. This seems to come into play during the loading of test data
(specifically, when calling testdata/bin/load_nested.py) mainly because at one
point there was some well-intentioned but probably misguided attempt at code
reuse from the test framework. The test code that gets re-used involves impyla
and/or thrift-sasl, which currently still relies on thrift 0.9.3. So our test
framework, and by extension the build, both inherit the same limitation.
The impala-shell, on the other hand, luckily doesn't directly reuse any of the
same test modules, and there really is no need to keep it pinned to 0.9.3.
However, since calling the impala-shell.sh winds up invoking
{{set-pythonpath.sh}}, the same script that script sets up the environment
during building or testing, thrift 0.9.3 just kind of leaks over by default.
As it turns out, thrift 0.9.3 is also one of the many limitations restricting
the impala-shell to python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0
is available -- we just have to use it. And the way to accomplish that is by
decoupling the impala-shell from relying either {{set-pythonpath.sh}} or
{{impala-python-common.sh}}.
> Setup impala-shell.sh env separately, and use thrift-0.11.0 by default
> ----------------------------------------------------------------------
>
> Key: IMPALA-9489
> URL: https://issues.apache.org/jira/browse/IMPALA-9489
> Project: IMPALA
> Issue Type: Improvement
> Components: Infrastructure
> Affects Versions: Impala 3.4.0
> Reporter: David Knupp
> Assignee: David Knupp
> Priority: Major
>
> [Note: this JIRA was filed in relation to the ongoing effort to make the
> impala-shell compatible with python 3]
> The impala python development environment is a fairly convoluted affair -- a
> number of packages are installed in the infra/python/env, some of it comes
> from the toolchain, some of it is generated and lives in the shell directory.
> Generally speaking, if you launch impala-python and import a module, it's not
> necessarily easy to predict where the module might live.
> {noformat}
> $ python
> Python 2.7.10 (default, Aug 17 2018, 19:45:58)
> [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sasl
> >>> sasl
> <module 'sasl' from
> '/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
> >>> import requests
> >>> requests
> <module 'requests' from
> '/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
> >>> import Logging
> >>> Logging
> <module 'Logging' from
> '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
> >>> import thrift
> >>> thrift
> <module 'thrift' from
> '/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
> {noformat}
> Really, there is no one coherent environment -- there's just whatever
> collection of modules happens to be available at a given time for a given
> type of invocation, all of which is accomplished behind the scenes by calling
> scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}}
> that are responsible for cobbling together a PYTHONPATH based on known
> locations and current env variables.
> As far as I can tell, there are three important contexts where python comes
> into play...
> * during the build process (used during data load, e.g.,
> testdata/bin/load_nested.py)
> * when running the py.test bases e2e tests
> * whenever the impala-shell is invoked
> As noted by IMPALA-7825 (and also in a conversation I had with
> [~stakiar_impala_496e]), we're dependent on thrift 0.9.3 during the build
> process. This seems to come into play during the loading of test data
> (specifically, when calling testdata/bin/load_nested.py) mainly because at
> one point there was some well-intentioned but probably misguided attempt at
> code reuse from the test framework. The test code that gets re-used involves
> impyla and/or thrift-sasl, which currently still relies on thrift 0.9.3. So
> our test framework, and by extension the build, both inherit the same
> limitation.
> The impala-shell, on the other hand, luckily doesn't directly reuse any of
> the same test modules, and there really is no need to keep it pinned to
> 0.9.3. However, since calling the impala-shell.sh winds up invoking
> {{set-pythonpath.sh}}, the same script that script sets up the environment
> during building or testing, thrift 0.9.3 just kind of leaks over by default.
> As it turns out, thrift 0.9.3 is also one of the many limitations restricting
> the impala-shell to python 2. Luckily, with IMPALA-7924 resolved,
> thrift-0.11.0 is available -- we just have to use it. And the way to
> accomplish that is by decoupling the impala-shell from relying either
> {{set-pythonpath.sh}} or {{impala-python-common.sh}}.
> As a first pass, we can address the dev environment by just having
> {{impala-shell.sh}} itself do whatever is required to find python
> dependencies, and we can specify thrift-0.11.0 there. Also, thrift 0.11.0
> should be used by both of the scripts used to create the tarballs that
> package the impala-shell for customer environments. Neither of these should
> adversely building Impala or running the py.test test framework.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]