dianfu commented on a change in pull request #10597: 
[FLINK-15270][python][docs] Add documentation about how to specify third-party 
dependencies via API for Python UDFs
URL: https://github.com/apache/flink/pull/10597#discussion_r358572060
 
 

 ##########
 File path: docs/dev/table/functions/udfs.md
 ##########
 @@ -211,6 +211,76 @@ table_env.register_function("add", add)
 # use the function in Python Table API
 my_table.select("add(a, b)")
 {% endhighlight %}
+
+If the python scalar function depends on other dependencies, you can specify 
the dependencies with the following table APIs or through <a href="{{ 
site.baseurl }}/ops/cli.html#usage">command line</a> directly when submit the 
job.
+
+<table class="table table-bordered">
+  <thead>
+    <tr>
+      <th class="text-left" style="width: 20%">Dependencies</th>
+      <th class="text-left">Description</th>
+    </tr>
+  </thead>
+
+  <tbody>
+    <tr>
+      <td>files</td>
+      <td>
+        <p>Adds python file dependencies which could be python files, python 
packages or local directories. They will be added to the PYTHONPATH of the 
python UDF worker.</p>
+{% highlight python %}
+table_env.add_python_file(file_path)
+{% endhighlight %}
+      </td>
+    </tr>
+    <tr>
+      <td>requirements</td>
+      <td>
+        <p>Specifies a requirements.txt file which defines the third-party 
dependencies. These dependencies will be installed to a temporary directory and 
added to the PYTHONPATH of the python UDF worker. For the dependencies which 
could not be accessed in the cluster, a directory which contains the 
installation packages of these dependencies could be specified using the 
parameter "requirements_cached_dir". It will be uploaded to the cluster to 
support offline installation.</p>
+{% highlight python %}
+# commands executed in shell
+echo numpy==1.16.5 > requirements.txt
+pip download -d cached_dir -r requirements.txt --no-binary :all:
+
+# python code
+table_env.set_python_requirements("requirements.txt", "cached_dir")
+{% endhighlight %}
+        <p>Please make sure the installation packages matches the platform of 
the cluster and the python version used. These packages will be installed using 
pip.</p>
+      </td>
+    </tr>
+    <tr>
+      <td>archive</td>
+      <td>
+        <p>Adds a python archive file dependency. The file will be extracted 
to the working directory of python UDF worker. If the parameter "target_dir" is 
specified, the archive file will be extracted to a directory named 
"target_dir". Otherwise, the archive file will be extracted to a directory with 
the same name of the archive file.</p>
+{% highlight python %}
+# command executed in shell
+# assert the relative path of python interpreter is py_env/bin/python
+zip -r py_env.zip py_env
+
+# python code
+table_env.add_python_archive("py_env.zip")
+table_env.get_config().set_python_executable("py_env.zip/py_env/bin/python")
+
+# or
+table_env.add_python_archive("py_env.zip", "myenv")
 
 Review comment:
   What about adding an example about how to use the data files of the archive 
in Python UDF?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to