[ 
https://issues.apache.org/jira/browse/HDDS-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264729#comment-17264729
 ] 

Marton Elek edited comment on HDDS-4687 at 1/14/21, 9:57 AM:
-------------------------------------------------------------

Testing is almost free (thanks to your patch), and good to double-check our 
expectations. I just applied your patch and re-started the export.

But after ten minutes I gave up:

{code}
2021-01-14 01:46:44,504 [main] INFO debug.ExportContainer: Preparation is done
{code}

Only 1/3 of the file is copied during this time:

{code}
 date && ls -lah container-6.tar.gz 
Thu Jan 14 01:55:32 PST 2021
-rw-r--r-- 1 root root 1.5G Jan 14 01:55 container-6.tar.gz
{code}

I had 56 cores and only one was busy with the compression.


was (Author: elek):
Testing is almost free (thanks to your patch), and good to double-check our 
expectations. I just applied your patch and re-started the export.

But after ten minutes I gave up:

{code}
2021-01-14 01:46:44,504 [main] INFO debug.ExportContainer: Preparation is done
{code}

Only 1/3 of the file is copied during this time:

{code}
 date && ls -lah container-6.tar.gz 
Thu Jan 14 01:55:32 PST 2021
-rw-r--r-- 1 root root 1.5G Jan 14 01:55 container-6.tar.gz
{code}

I had 56 core and only one was busy with the compression.

> Disable compression for closed-container replication
> ----------------------------------------------------
>
>                 Key: HDDS-4687
>                 URL: https://issues.apache.org/jira/browse/HDDS-4687
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: Marton Elek
>            Assignee: Marton Elek
>            Priority: Critical
>         Attachments: HDDS-4687.patch
>
>
> During the measurement of closed container replication I found that the 
> biggest bottleneck is the read side. 5 Gb container is replicated under ~3 
> minutes but ~2:30 was the downloading part.
> Closed containers are replicated via GRPC. The source side creates an 
> OutputStream on-the-fly (OnDemandContainerReplicationSource.java) and stream 
> all the container content as a "tar.gz" archive to the client.
> It turned out that the compression (the .gz part) is quite expensive:
> I created a CLI tool to export containers to tar files (same logic as the 
> replication but without streaming via GRPC, just saving to a file).
> I have seen the 2:30 time to create the archive:
> {code}
> 2021-01-13 05:51:25,302 [main] INFO debug.ExportContainer: Preparation is done
> 2021-01-13 05:53:53,472 [main] INFO debug.ExportContainer: Container is 
> exported to /tmp/container-3.tar.gz
> {code}
> But when I removed the compression in TarContainerPacker.java, the speed was 
> significant better (25 sec instead of the 150 sec)
> {code}
> 2021-01-13 06:11:46,254 [main] INFO debug.ExportContainer: Preparation is done
> 2021-01-13 06:12:11,512 [main] INFO debug.ExportContainer: Container is 
> exported to /tmp/container-3.tar
> {code}
> As a result I suggest turning off the compression for closed container 
> replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to