dianfu commented on a change in pull request #10017: [FLINK-14019][python] add
support for managing environment and dependencies of Python UDF in Flink Python
API
URL: https://github.com/apache/flink/pull/10017#discussion_r348491580
##########
File path: flink-python/pyflink/table/table_environment.py
##########
@@ -715,6 +718,111 @@ def create_temporary_view(self, view_path, table):
"""
self._j_tenv.createTemporaryView(view_path, table._j_table)
+ def add_python_file(self, file_path):
+ """
+ Upload single python file, python package or local directory to
cluster.
+ These files and directory will append to the PYTHONPATH variable of
python UDF worker.
+ Please make sure they can be imported directly.
+
+ If python UDFs used in job depend on additional files, this method
should be called to
+ upload them to cluster.
+
+ :param file_path: The path of python file, python package or local
directory.
+ :type file_path: str
+ """
+ self._dependency_manager.add_python_file(file_path)
+
+ def set_python_requirements(self, requirements_file_path,
requirements_cache_dir=None):
+ """
+ Upload a "requirements.txt" file to specify the third-party libraries
used in python UDF.
+ These libraries will be installed to a temporary directory and be
imported to the python
+ UDF worker.
+
+ If python UDFs used in job depend on third-party libraries, this
method should be called
+ to transmit them to cluster.
+
+ If the cluster is running on a private network that cannot access the
PyPI central
+ repository, a local directory contains all packages listed in the
"requirements.txt" file
+ should be specified using the "requirements_cached_dir" parameter. It
will be uploaded to
+ cluster to support offline installation.
+
+ Example:
+ ::
+
+ # commands executed in shell
+ $ echo numpy==1.16.5 > requirements.txt
+ $ pip download -d cached_dir -r requirements.txt --no-binary :all:
+
+ # python code
+ >>> table_env.set_python_requirements("requirements.txt",
"cached_dir")
+
+ .. note::
+
+ Please make sure the package file matches the platform and python
version of cluster.
+ These packages will be installed using pip.
+
+ :param requirements_file_path: The path of "requirements.txt" file.
+ :type requirements_file_path: str
+ :param requirements_cache_dir: The path of the local directory
contains all packages
+ listed in the "requirements.txt" file.
+ :type requirements_cache_dir: str
+ """
+
self._dependency_manager.set_python_requirements(requirements_file_path,
+
requirements_cache_dir)
+
+ def add_python_archive(self, archive_path, target_dir):
+ """
+ Upload a packaged file to cluster. The file will be extracted to the
working directory of
+ python UDF worker.
+
+ If the "target_dir" is specified. The file will be extracted to the
directory with the
+ same name of the value of "target_dir".
+ The extracted path would be:
${python_udf_working_directory}/${target_dir}/
+
+ If the "target_dir" is not specified. The file will be extracted to
the directory with the
+ same name of this archive file.
+ The extracted path would be:
${python_udf_working_directory}/{archive_file_name}/
+
+ If python UDFs need to run in a specific python environment and the
environment is not
+ exist on cluster, this method can be used to upload the packaged
environment.
+ Note that the python interpreter path of the uploaded environment
should be specified
+ via the method :func:`pyflink.table.TableConfig.set_python_executable`.
+
+ The files uploaded via this method is also accessible in UDFs. Using
relative path to access
+ them.
+
+ Example:
+ ::
+
+ # command executed in shell
+ # assert the relative path of python interpreter is
py_env/bin/python
+ $ zip -r py_env.zip py_env
+
+ # python code
+ >>> table_env.add_python_archive("py_env.zip", "myenv")
+ >>>
table_env.get_config().set_python_executable("myenv/py_env/bin/python")
+
+ # the files in archive is also accessible in UDF
+ >>> def my_udf():
+ ... with open("myenv/py_env/data/data.txt") as f:
+ ... ...
+
+ .. note::
+
+ Please make sure the uploaded python environment matches the
platform that the cluster
+ running on and that the version is 3.5 or higher.
+
+ .. note::
+
+ Currently only .zip file format is supported.
+
+ :param archive_path: The archive file path.
+ :type archive_path: str
+ :param target_dir: The target dir name that the archive file extracted
to.
+ :type target_dir: str
+ """
+ self._dependency_manager.add_python_archive(archive_path, target_dir)
Review comment:
As we only support zip file for now, what about adding some checks and give
some meaningful error messages will be better.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services