Marton Elek created HDDS-4687:
---------------------------------

             Summary: Disable compression for closed-container replication
                 Key: HDDS-4687
                 URL: https://issues.apache.org/jira/browse/HDDS-4687
             Project: Hadoop Distributed Data Store
          Issue Type: Improvement
            Reporter: Marton Elek
            Assignee: Marton Elek


During the measurement of closed container replication I found that the biggest 
bottleneck is the read side. 5 Gb container is replicated under ~3 minutes but 
~2:30 was the downloading part.

Closed containers are replicated via GRPC. The source side creates an 
OutputStream on-the-fly (OnDemandContainerReplicationSource.java) and stream 
all the container content as a "tar.gz" archive to the client.

It turned out that the compression (the .gz part) is quite expensive:

I created a CLI tool to export containers to tar files (same logic as the 
replication but without streaming via GRPC, just saving to a file).

I have seen the 2:30 time to create the archive:

{code}
2021-01-13 05:51:25,302 [main] INFO debug.ExportContainer: Preparation is done
2021-01-13 05:53:53,472 [main] INFO debug.ExportContainer: Container is 
exported to /tmp/container-3.tar.gz
{code}

But when I removed the compression in TarContainerPacker.java, the speed was 
significant better (25 sec instead of the 150 sec)

{code}
2021-01-13 06:11:46,254 [main] INFO debug.ExportContainer: Preparation is done
2021-01-13 06:12:11,512 [main] INFO debug.ExportContainer: Container is 
exported to /tmp/container-3.tar
{code}

As a result I suggest turning off the compression for closed container 
replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to