Radeity opened a new issue, #12849:
URL: https://github.com/apache/dolphinscheduler/issues/12849

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Description
   
   DS supports file transfer between tasks in this PR:
   - https://github.com/apache/dolphinscheduler/pull/12552
   
   <br>
   
   In current way, intermediate files will be uploaded  from `srcPath` in 
upstream task to `sourcePath` in resource center, and then downstream task will 
load the file from `sourcePath` in resource center. 
   #### Current logic
   <img width="1495" alt="image" 
src="https://user-images.githubusercontent.com/45198818/200986807-ac152753-658f-4925-99f5-8c3c5fd15d57.png";>
   
   
   <br>
   
   However, it's unnecessary to upload files to resource center, rather, we can 
transfer files end-to-end from upstream worker to downstream worker by `scp`. 
   In detail, upstream worker sends scp command template to resource center 
instead of raw file, also save in resourcePath. Then, downstream worker reads  
the command template from resource center and complete the command with 
targetPath. Finally, downstream worker can read file from upstream worker by 
executing `scp` command.
   
   <img width="1559" alt="image" 
src="https://user-images.githubusercontent.com/45198818/200986035-3208b7d6-99b7-4dc1-bcd9-73427ea99dc2.png";>
   <br>
   
   BTW, i won't replace current way, but add two user options:
   - `transfer.file.dir`: tmp directory to save intermediate files in worker 
node.
   -  `transfer.file.size`: maximum limits of tmp directory. Intermediate file 
will be uploaded if exceed the limit, otherwise, just upload command template. 
We can add some flag and the downstream worker will know whether read the whole 
file or have to execute `scp` command.
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to