Radeity opened a new issue, #12849:
URL: https://github.com/apache/dolphinscheduler/issues/12849

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Description
   
   DS supports file transfer between tasks in this PR:
   - https://github.com/apache/dolphinscheduler/pull/12552
   
   <br>
   
   In current way, intermediate files will be uploaded  from `srcPath` in 
upstream task to `sourcePath` in resource center, and then downstream task will 
load the file from `sourcePath` in resource center. 
   #### Current logic
   <img width="1495" alt="image" 
src="https://user-images.githubusercontent.com/45198818/200986807-ac152753-658f-4925-99f5-8c3c5fd15d57.png";>
   
   
   <br>
   
   However, it's unnecessary to upload files to resource center, rather, we can 
transfer files end-to-end from upstream worker to downstream worker by `scp`. 
   In detail, upstream worker sends scp command template to resource center 
instead of raw file, also save in resourcePath. Then, downstream worker reads  
the command template from resource center and complete the command with 
targetPath. Finally, downstream worker can read file from upstream worker by 
executing `scp` command.
   
   #### SCP logic
   <img width="1559" alt="image" 
src="https://user-images.githubusercontent.com/45198818/200986035-3208b7d6-99b7-4dc1-bcd9-73427ea99dc2.png";>
   <br>
   
   BTW, i won't replace current way, but add two user options:
   - `transfer.file.dir`: tmp directory to save intermediate files in worker 
node.
   -  `transfer.file.size`: maximum storage limits of tmp directory. 
Intermediate file will be uploaded if exceed the limit, otherwise, just upload 
command template. We can add some flag and the downstream worker will know 
whether read the whole file or have to execute `scp` command. In addition, 
intermediate files will be cleaned after running process via rpc, i'll add 
`ProcessCleanProcessor` to handle it.
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to