elek opened a new pull request #2170: URL: https://github.com/apache/ozone/pull/2170
## What changes were proposed in this pull request? During the measurement of closed container replication I found that the biggest bottleneck is the read side. 5 Gb container is replicated under ~3 minutes but ~2:30 was the downloading part. Closed containers are replicated via GRPC. The source side creates an `OutputStream` on-the-fly (`OnDemandContainerReplicationSource.java`) and stream all the container content as a "tar.gz" archive to the client. It turned out that the compression (the .gz part) is quite expensive: I created a CLI tool to export containers to tar files (same logic as the replication but without streaming via GRPC, just saving to a file). I have seen the 2:30 time to create the archive: ``` 2021-01-13 05:51:25,302 [main] INFO debug.ExportContainer: Preparation is done 2021-01-13 05:53:53,472 [main] INFO debug.ExportContainer: Container is exported to /tmp/container-3.tar.gz ``` But when I removed the compression in `TarContainerPacker.java`, the speed was significant better (25 sec instead of the 150 sec) ``` 2021-01-13 06:11:46,254 [main] INFO debug.ExportContainer: Preparation is done 2021-01-13 06:12:11,512 [main] INFO debug.ExportContainer: Container is exported to /tmp/container-3.tar ``` As a result I suggest turning off the compression for closed container replication. More details: https://github.com/elek/ozone-notes/tree/master/20210113-closed-container-replication ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-4687 ## How was this patch tested? Tested in real kubernetes cluster: * data is generated with the freon data generator * containers were replicated with the freon container replicator (time is checked from the log) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
