elek opened a new pull request #2170:
URL: https://github.com/apache/ozone/pull/2170


   ## What changes were proposed in this pull request?
   
   During the measurement of closed container replication I found that the 
biggest bottleneck is the read side. 5 Gb container is replicated under ~3 
minutes but ~2:30 was the downloading part.
   
   Closed containers are replicated via GRPC. The source side creates an 
`OutputStream` on-the-fly (`OnDemandContainerReplicationSource.java`) and 
stream all the container content as a "tar.gz" archive to the client.
   
   It turned out that the compression (the .gz part) is quite expensive:
   
   I created a CLI tool to export containers to tar files (same logic as the 
replication but without streaming via GRPC, just saving to a file).
   
   I have seen the 2:30 time to create the archive:
   
   ```
   2021-01-13 05:51:25,302 [main] INFO debug.ExportContainer: Preparation is 
done
   2021-01-13 05:53:53,472 [main] INFO debug.ExportContainer: Container is 
exported to /tmp/container-3.tar.gz
   ```
   
   But when I removed the compression in `TarContainerPacker.java`, the speed 
was significant better (25 sec instead of the 150 sec)
   
   ```
   2021-01-13 06:11:46,254 [main] INFO debug.ExportContainer: Preparation is 
done
   2021-01-13 06:12:11,512 [main] INFO debug.ExportContainer: Container is 
exported to /tmp/container-3.tar
   ```
   
   As a result I suggest turning off the compression for closed container 
replication.
   
   More details: 
https://github.com/elek/ozone-notes/tree/master/20210113-closed-container-replication
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4687
   
   ## How was this patch tested?
   
   Tested in real kubernetes cluster: 
   
    * data is generated with the freon data generator 
    * containers were replicated with the freon container replicator (time is 
checked from the log)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to