Marton Elek created HDDS-4687:
---------------------------------
Summary: Disable compression for closed-container replication
Key: HDDS-4687
URL: https://issues.apache.org/jira/browse/HDDS-4687
Project: Hadoop Distributed Data Store
Issue Type: Improvement
Reporter: Marton Elek
Assignee: Marton Elek
During the measurement of closed container replication I found that the biggest
bottleneck is the read side. 5 Gb container is replicated under ~3 minutes but
~2:30 was the downloading part.
Closed containers are replicated via GRPC. The source side creates an
OutputStream on-the-fly (OnDemandContainerReplicationSource.java) and stream
all the container content as a "tar.gz" archive to the client.
It turned out that the compression (the .gz part) is quite expensive:
I created a CLI tool to export containers to tar files (same logic as the
replication but without streaming via GRPC, just saving to a file).
I have seen the 2:30 time to create the archive:
{code}
2021-01-13 05:51:25,302 [main] INFO debug.ExportContainer: Preparation is done
2021-01-13 05:53:53,472 [main] INFO debug.ExportContainer: Container is
exported to /tmp/container-3.tar.gz
{code}
But when I removed the compression in TarContainerPacker.java, the speed was
significant better (25 sec instead of the 150 sec)
{code}
2021-01-13 06:11:46,254 [main] INFO debug.ExportContainer: Preparation is done
2021-01-13 06:12:11,512 [main] INFO debug.ExportContainer: Container is
exported to /tmp/container-3.tar
{code}
As a result I suggest turning off the compression for closed container
replication.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]