[ 
https://issues.apache.org/jira/browse/FLINK-39194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-39194:
------------------------------
    Issue Type: Improvement  (was: Bug)

> Improve PyFlink's support on specifying local resource files
> ------------------------------------------------------------
>
>                 Key: FLINK-39194
>                 URL: https://issues.apache.org/jira/browse/FLINK-39194
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Biao Geng
>            Priority: Major
>         Attachments: image-2026-03-03-11-08-20-309.png
>
>
> When running a PyFlink job, we may need to add and ship some resource files 
> to workers (e.g. model weights of a PyTorch model) to run the python program.
>  
> Currently, we have some relevant options:
>  # {*}--pyFiles{*}:  this option would add files like .py/.egg/.zip/.whl or 
> directory to PYTHONPATH . It looks like a good fit however, in our current 
> [implementation|https://github.com/apache/flink/blob/4f85d3074eccfe628e2926269ec7e943c61d2a9c/flink-python/src/main/java/org/apache/flink/python/env/AbstractPythonEnvironmentManager.java#L328],
>  for a normal resource(e.g. resnet18-f37072fd.pth), we would build a path 
> like 
> `/mnt/disk1/yarn/nm-local-dir/usercache/root/appcache/application_xxx_0007/python-dist-xxx-xx-xx-xx-xx/python-files/resnet18-f37072fd.pth/resnet18-f37072fd.pth`.
>  The filename would be repeated and as the file is added in PYTHONPATH, not 
> working dir, we need to add some extra codes to build the real path in tht 
> python script to use it. This also implies that we may need to improve the 
> document here:
> !image-2026-03-03-11-08-20-309.png!
> 2. {*}--pyArchieves{*}: this option is only for zipped files and it can work 
> when we build a specific tar (e.g. --pyArchives 
> hdfs:///envs/gpu_test_env.tar.gz#gpu_env,hdfs:///models/resnet18-f37072fd.pth,hdfs:///models/imagenet_classes.txt).
>  The only shortage is that users have to build a tar or zip by themselves.
>  
> We may fix the usage of --pyFiles in the doc and avoid the repeatance of the 
> normal files in the --pyFiles



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to