[ 
https://issues.apache.org/jira/browse/FLINK-14306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944976#comment-16944976
 ] 

Hequn Cheng commented on FLINK-14306:
-------------------------------------

Hi [~pnowojski] [~chesnay] [~trohrmann], sorry for the trouble that brings to 
you and many thanks for the advice.

The problem is caused by the plugin in the pom under flink-python. The plugin 
calls gen_protos.py to generate python files in pyflink.zip. This introduces 
python dependencies and causes builds failing. These dependencies are necessary 
because we don't want to provide a semi-finished package(i.e., the package 
without the generated python files). 

[~dianfu] and I discussed offline and we think it's better to use local 
virtualenv to solve the problem so that the dependencies can be resolved 
automatically and we don't need to document it either. This is also mentioned 
by [~pnowojski] above. 

The virtual env solution may take a couple of days(we should also take the 
builds under windows into consideration). Before this, we can create a hotfix 
to remove the plugin which calls gen_protos.py to unblock these build failures 
asap.

What do you guys think? [~pnowojski][~chesnay][~trohrmann]

> flink-python build fails with No module named pkg_resources
> -----------------------------------------------------------
>
>                 Key: FLINK-14306
>                 URL: https://issues.apache.org/jira/browse/FLINK-14306
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Python, Build System
>    Affects Versions: 1.10.0
>            Reporter: Piotr Nowojski
>            Priority: Critical
>             Fix For: 1.10.0
>
>
> [Benchmark 
> builds|http://codespeed.dak8s.net:8080/job/flink-master-benchmarks/4576/console]
>  started to fail with
> {noformat}
> [INFO] Adding generated sources (java): 
> /home/jenkins/workspace/flink-master-benchmarks/flink/flink-python/target/generated-sources
> [INFO] 
> [INFO] --- exec-maven-plugin:1.5.0:exec (Protos Generation) @ 
> flink-python_2.11 ---
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/flink-master-benchmarks/flink/flink-python/pyflink/gen_protos.py",
>  line 33, in <module>
>     import pkg_resources
> ImportError: No module named pkg_resources
> [ERROR] Command execution failed.
> (...)
> [INFO] flink-state-processor-api .......................... SUCCESS [  0.299 
> s]
> [INFO] flink-python ....................................... FAILURE [  0.434 
> s]
> [INFO] flink-scala-shell .................................. SKIPPED
> {noformat}
> because of this ticket: https://issues.apache.org/jira/browse/FLINK-14018
> I think I can solve the benchmark builds failing quite easily by installing 
> {{setuptools}} python package, so this ticket is not about this, but about 
> deciding how should we treat such kind of external dependencies. I don't see 
> this dependency being mentioned anywhere in the documentation ([for example 
> here|https://ci.apache.org/projects/flink/flink-docs-stable/flinkDev/building.html]).
> Probably at the very least those external dependencies should be documented, 
> but also I fear about such kind of manual steps to do before building the 
> Flink can become a problem if grow out of control. Some questions:
> # Do we really need this dependency?
> # Could this dependency be resolve automatically? By installing into a local 
> python virtual environment?
> # Should we document those dependencies somewhere?
> # Maybe we should not build flink-python by default?
> # Maybe we should add a pre-build script for flink-python to verify the 
> dependencies and to throw an easy to understand error with hint how to fix it?
> CC [~hequn] [~dian.fu] [~trohrmann] [~jincheng]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to