Radeity opened a new issue, #12821:
URL: https://github.com/apache/dolphinscheduler/issues/12821

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Description
   
   In current version, DS support use `HDFS`, `OSS`, `S3` as storage layer of 
resource center. Nowadays, most of storage capacity is provided by cloud 
storage service which means OSS and S3 are in widely use. Nevertheless, charge 
depends on objects' size. I've noticed that in our implementation, we just 
import the package and use client SDK provided by cloud vendors to upload 
object, such as:
   ```java
   import com.amazonaws.services.s3.*;
   
   public boolean mkdir(String tenantCode, ...) {
       ...
       s3Client.putObject(putObjectRequest);
       ...
   }
   
   public void vimFile(String tenantCode,...) {
       ...
       S3Object o = s3Client.getObject(BUCKET_NAME, srcFilePath);
       ...
   }
   ```
   I've explored the source code and find that the client SDK finally put raw 
objects into content and upload them by http request. Http protocol has its own 
standard to compress transmission data packets by `gzip` algorithm which can 
reduce network I/O. However, packets will be decompressed and the size of 
object remain unchanged.
   
   Therefore, we can use better compression algorithm like `Zstandard` to 
compress file or directory before `putObject`, and decompress objects after 
`getObject`. Bring in this client-side compression step, DS can effectively 
reduce object size because of effectiveness of compression algorithm.
   
   
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to