Radeity opened a new issue, #12849: URL: https://github.com/apache/dolphinscheduler/issues/12849
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement. ### Description DS supports file transfer between tasks in this PR: - https://github.com/apache/dolphinscheduler/pull/12552 <br> In current way, intermediate files will be uploaded from `srcPath` in upstream task to `sourcePath` in resource center, and then downstream task will load the file from `sourcePath` in resource center. #### Current logic <img width="1495" alt="image" src="https://user-images.githubusercontent.com/45198818/200986807-ac152753-658f-4925-99f5-8c3c5fd15d57.png"> <br> However, it's unnecessary to upload files to resource center, rather, we can transfer files end-to-end from upstream worker to downstream worker by `scp`. In detail, upstream worker sends scp command template to resource center instead of raw file, also save in resourcePath. Then, downstream worker reads the command template from resource center and complete the command with targetPath. Finally, downstream worker can read file from upstream worker by executing `scp` command. #### SCP logic <img width="1559" alt="image" src="https://user-images.githubusercontent.com/45198818/200986035-3208b7d6-99b7-4dc1-bcd9-73427ea99dc2.png"> <br> BTW, i won't replace current way, but add two user options: - `transfer.file.dir`: tmp directory to save intermediate files in worker node. - `transfer.file.size`: maximum storage limits of tmp directory. Intermediate file will be uploaded if exceed the limit, otherwise, just upload command template. We can add some flag and the downstream worker will know whether read the whole file or have to execute `scp` command. In addition, intermediate files will be cleaned after running process via rpc, i'll add `ProcessCleanProcessor` to handle it. ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
