[ 
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-9489:
--------------------------------
    Description: 
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl
<module 'sasl' from 
'/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
>>> import requests
>>> requests
<module 'requests' from 
'/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
>>> import Logging
>>> Logging
<module 'Logging' from '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
>>> import thrift
>>> thrift
<module 'thrift' from 
'/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
{noformat}
Really, there is no one coherent environment -- there's just whatever 
collection of modules happens to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by calling 
scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that 
are responsible for cobbling together a PYTHONPATH based on known locations and 
current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 during the build 
process. This seems to come into play during the loading of test data 
(specifically, when calling testdata/bin/load_nested.py) mainly because at one 
point there was some well-intentioned but probably misguided attempt at code 
reuse from the test framework. The test code that gets re-used involves impyla 
and/or thrift-sasl, which currently still relies on thrift 0.9.3. So our test 
framework, and by extension the build, both inherit the same limitation.

The impala-shell, on the other hand, luckily doesn't directly reuse any of the 
same test modules, and there really is no need to keep it pinned to 0.9.3. 
However, since calling the impala-shell.sh winds up invoking 
{{set-pythonpath.sh}}, the same script that script sets up the environment 
during building or testing, thrift 0.9.3 just kind of leaks over by default.

As it turns out, thrift 0.9.3 is also one of the many limitations restricting 
the impala-shell to python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 
is available -- we just have to use it. And the way to accomplish that  is by 
decoupling the impala-shell from relying either {{set-pythonpath.sh}} or 
{{impala-python-common.sh}}. 

  was:
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl
<module 'sasl' from 
'/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
>>> import requests
>>> requests
<module 'requests' from 
'/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
>>> import Logging
>>> Logging
<module 'Logging' from '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
>>> import thrift
>>> thrift
<module 'thrift' from 
'/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
{noformat}
Really, there is no one coherent environment -- there's just whatever 
collection of modules happens to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by calling 
scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that 
are responsible for cobbling together a PYTHONPATH based on known locations and 
current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It 
seems to happen during test data load (specifically, when calling 
testdata/bin/load_nested.py) mainly because there was some well-intentioned but 
probably misjudged attempt at code reuse from the test framework. The test code 
that gets re-used involves impyla and/or thrift-sasl, which currently still 
relies on thrift 0.9.3. So our test framework, and by extension the build, both 
inherit the same limitation.

The impala-shell, on the other hand, luckily doesn't directly reuse any of the 
same modules, and there's no real need to keep it pinned to 0.9.3. However, 
since calling the impala-shell.sh winds up invoking {{set-pythonpath.sh}}, the 
same script that script sets up the environment during building or testing, the 
shell winds up defaulting to thrift 0.9.3 as well.

thrift 0.9.3 is one of the many limitations restricting the impala-shell to 
python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available -- we 
just have to use it. The way to accomplish this is be decoupling the 
impala-shell from calling either {{set-pythonpath.sh}} or 
{{impala-python-common.sh}}.


> Setup impala-shell.sh env separately, and use thrift-0.11.0 by default
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-9489
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9489
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>    Affects Versions: Impala 3.4.0
>            Reporter: David Knupp
>            Assignee: David Knupp
>            Priority: Major
>
> [Note: this JIRA was filed in relation to the ongoing effort to make the 
> impala-shell compatible with python 3]
> The impala python development environment is a fairly convoluted affair -- a 
> number of packages are installed in the infra/python/env, some of it comes 
> from the toolchain, some of it is generated and lives in the shell directory. 
> Generally speaking, if you launch impala-python and import a module, it's not 
> necessarily easy to predict where the module might live.
> {noformat}
> $ python
> Python 2.7.10 (default, Aug 17 2018, 19:45:58)
> [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sasl
> >>> sasl
> <module 'sasl' from 
> '/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
> >>> import requests
> >>> requests
> <module 'requests' from 
> '/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
> >>> import Logging
> >>> Logging
> <module 'Logging' from 
> '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
> >>> import thrift
> >>> thrift
> <module 'thrift' from 
> '/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
> {noformat}
> Really, there is no one coherent environment -- there's just whatever 
> collection of modules happens to be available at a given time for a given 
> type of invocation, all of which is accomplished behind the scenes by calling 
> scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} 
> that are responsible for cobbling together a PYTHONPATH based on known 
> locations and current env variables.
> As far as I can tell, there are three important contexts where python comes 
> into play...
> * during the build process (used during data load, e.g., 
> testdata/bin/load_nested.py)
> * when running the py.test bases e2e tests
> * whenever the impala-shell is invoked
> As noted by IMPALA-7825 (and also in a conversation I had with 
> [~stakiar_impala_496e]), we're dependent on thrift 0.9.3 during the build 
> process. This seems to come into play during the loading of test data 
> (specifically, when calling testdata/bin/load_nested.py) mainly because at 
> one point there was some well-intentioned but probably misguided attempt at 
> code reuse from the test framework. The test code that gets re-used involves 
> impyla and/or thrift-sasl, which currently still relies on thrift 0.9.3. So 
> our test framework, and by extension the build, both inherit the same 
> limitation.
> The impala-shell, on the other hand, luckily doesn't directly reuse any of 
> the same test modules, and there really is no need to keep it pinned to 
> 0.9.3. However, since calling the impala-shell.sh winds up invoking 
> {{set-pythonpath.sh}}, the same script that script sets up the environment 
> during building or testing, thrift 0.9.3 just kind of leaks over by default.
> As it turns out, thrift 0.9.3 is also one of the many limitations restricting 
> the impala-shell to python 2. Luckily, with IMPALA-7924 resolved, 
> thrift-0.11.0 is available -- we just have to use it. And the way to 
> accomplish that  is by decoupling the impala-shell from relying either 
> {{set-pythonpath.sh}} or {{impala-python-common.sh}}. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to