[
https://issues.apache.org/jira/browse/FLINK-39194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Biao Geng updated FLINK-39194:
------------------------------
Issue Type: Improvement (was: Bug)
> Improve PyFlink's support on specifying local resource files
> ------------------------------------------------------------
>
> Key: FLINK-39194
> URL: https://issues.apache.org/jira/browse/FLINK-39194
> Project: Flink
> Issue Type: Improvement
> Reporter: Biao Geng
> Priority: Major
> Attachments: image-2026-03-03-11-08-20-309.png
>
>
> When running a PyFlink job, we may need to add and ship some resource files
> to workers (e.g. model weights of a PyTorch model) to run the python program.
>
> Currently, we have some relevant options:
> # {*}--pyFiles{*}: this option would add files like .py/.egg/.zip/.whl or
> directory to PYTHONPATH . It looks like a good fit however, in our current
> [implementation|https://github.com/apache/flink/blob/4f85d3074eccfe628e2926269ec7e943c61d2a9c/flink-python/src/main/java/org/apache/flink/python/env/AbstractPythonEnvironmentManager.java#L328],
> for a normal resource(e.g. resnet18-f37072fd.pth), we would build a path
> like
> `/mnt/disk1/yarn/nm-local-dir/usercache/root/appcache/application_xxx_0007/python-dist-xxx-xx-xx-xx-xx/python-files/resnet18-f37072fd.pth/resnet18-f37072fd.pth`.
> The filename would be repeated and as the file is added in PYTHONPATH, not
> working dir, we need to add some extra codes to build the real path in tht
> python script to use it. This also implies that we may need to improve the
> document here:
> !image-2026-03-03-11-08-20-309.png!
> 2. {*}--pyArchieves{*}: this option is only for zipped files and it can work
> when we build a specific tar (e.g. --pyArchives
> hdfs:///envs/gpu_test_env.tar.gz#gpu_env,hdfs:///models/resnet18-f37072fd.pth,hdfs:///models/imagenet_classes.txt).
> The only shortage is that users have to build a tar or zip by themselves.
>
> We may fix the usage of --pyFiles in the doc and avoid the repeatance of the
> normal files in the --pyFiles
--
This message was sent by Atlassian Jira
(v8.20.10#820010)