[ 
https://issues.apache.org/jira/browse/HDDS-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Sizov updated HDDS-9228:
-------------------------------
    Description: 
h3. TL;DR:
*S3G writes all its responses byte-after-byte.*

h3. Details
This issue was discovered during a performance test run

h4. Cluster configuration
3 master nodes, 5 datanodes.
Each machine runs 96core CPU.
S3G instances are installed on master nodes (3 gateways).

h4. Test preparation
Before the test we uploaded 300000 files to Ozone, 20MB each.

h4. Test configuration
We ran two tests
1. pure writes, no concurrent reads
2. pure reads, no concurrent writes

h4. Load generator
3 load generator nodes, each runs 50 threads.

h4. Ozone configuration
The buckets were created with Erasure Coding RS-3-2-1024k

h3. Results
We found that  writes are 3 times faster than reads, moreover reads caused ~70% 
CPU usage.

Thread dumps and JFR showed the following stacktraces of HTTP threads:

Stacktrace:
{noformat}
"qtp2079179914-1055393" Id=1055393 RUNNABLE
        at 
org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(ResponseWriter.java:291)
        at 
org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:215)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:77)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:125)
        at 
org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:276)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1310)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:978)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1282)
        at 
org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.lambda$get$0(ObjectEndpoint.java:382)
{noformat}

JFR:

{noformat}
Stack Trace     Count   Percentage
void org.eclipse.jetty.server.HttpOutput.write(int)     431146  39 %
void 
org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(int)
    431145  39 %
void org.glassfish.jersey.message.internal.CommittingOutputStream.write(int)    
431145  39 %
void java.io.FilterOutputStream.write(int)      431145  39 %
void java.io.FilterOutputStream.write(byte[], int, int) 431145  39 %
void 
org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(byte[],
 int, int)    431145  39 %
long org.apache.commons.io.IOUtils.copyLarge(InputStream, OutputStream, byte[]) 
431145  39 %
{noformat}

We can clearly see the transition {{FilterOutputStream.write(byte[], int, int) 
-> FilterOutputStream.write(int)}}, meaning that any incoming array is written 
as single bytes, not as an array as a whole.

The place in the code that creates {{FilterOutputStream}} is 
{{org.apache.hadoop.ozone.s3.TracingFilter}}:

{code}
    OutputStream out = responseContext.getEntityStream();
    if (out != null) {
      responseContext.setEntityStream(new FilterOutputStream(out) {
        @Override
        public void close() throws IOException {
          super.close();
          finishAndClose(scope, span);
        }
      });
    }
{code}

Removing this filter or fixing {{FilterOutputStream.write(byte[], int, int)}} 
method resolves performance issues and we see a 5x better throughput and CPU 
around 12%.



  was:
h3. TL;DR:
*S3G writes all its responses byte-after-byte.*

h3. Details
This issue was discovered during a performance test run

h4. Cluster configuration
3 master nodes, 5 datanodes.
Each machine runs 96core CPU.
S3G instances are installed on master nodes (3 gateways).

h4. Test preparation
Before the test we uploaded 300000 files to Ozone, 20MB each.

h4. Test configuration
We ran two tests
1. pure writes, no concurrent reads
2. pure reads, no concurrent writes

h4. Load generator
3 load generator nodes, each runs 50 threads.

h4. Ozone configuration
The buckets were created with Erasure Coding RS-3-2-1024k

h3. Results
We found that  writes are 3 times faster than reads, moreover reads caused ~70% 
CPU usage.

Thread dumps and JFR showed the following stacktraces of HTTP threads:

Stacktrace:
{noformat}
"qtp2079179914-1055393" Id=1055393 RUNNABLE
        at 
org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(ResponseWriter.java:291)
        at 
org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:215)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:77)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:125)
        at 
org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:276)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1310)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:978)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1282)
        at 
org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.lambda$get$0(ObjectEndpoint.java:382)
{noformat}

JFR:

{noformat}
Stack Trace     Count   Percentage
void org.eclipse.jetty.server.HttpOutput.write(int)     431146  39 %
void 
org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(int)
    431145  39 %
void org.glassfish.jersey.message.internal.CommittingOutputStream.write(int)    
431145  39 %
void java.io.FilterOutputStream.write(int)      431145  39 %
void java.io.FilterOutputStream.write(byte[], int, int) 431145  39 %
void 
org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(byte[],
 int, int)    431145  39 %
long org.apache.commons.io.IOUtils.copyLarge(InputStream, OutputStream, byte[]) 
431145  39 %
{noformat}

We can clearly see the transition {{FilterOutputStream.write(byte[], int, int) 
->FilterOutputStream.write(int)}}, meaning that any incoming array is written 
as single bytes, not as an array as a whole.

The place in the code that creates {{FilterOutputStream}} is 
{{org.apache.hadoop.ozone.s3.TracingFilter}}:

{code}
    OutputStream out = responseContext.getEntityStream();
    if (out != null) {
      responseContext.setEntityStream(new FilterOutputStream(out) {
        @Override
        public void close() throws IOException {
          super.close();
          finishAndClose(scope, span);
        }
      });
    }
{code}

Removing this filter or fixing {{FilterOutputStream.write(byte[], int, int)}} 
method resolves performance issues and we see a 5x better throughput.




> Poor S3G read performance
> -------------------------
>
>                 Key: HDDS-9228
>                 URL: https://issues.apache.org/jira/browse/HDDS-9228
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: S3
>    Affects Versions: 1.4.0
>            Reporter: Kirill Sizov
>            Priority: Critical
>
> h3. TL;DR:
> *S3G writes all its responses byte-after-byte.*
> h3. Details
> This issue was discovered during a performance test run
> h4. Cluster configuration
> 3 master nodes, 5 datanodes.
> Each machine runs 96core CPU.
> S3G instances are installed on master nodes (3 gateways).
> h4. Test preparation
> Before the test we uploaded 300000 files to Ozone, 20MB each.
> h4. Test configuration
> We ran two tests
> 1. pure writes, no concurrent reads
> 2. pure reads, no concurrent writes
> h4. Load generator
> 3 load generator nodes, each runs 50 threads.
> h4. Ozone configuration
> The buckets were created with Erasure Coding RS-3-2-1024k
> h3. Results
> We found that  writes are 3 times faster than reads, moreover reads caused 
> ~70% CPU usage.
> Thread dumps and JFR showed the following stacktraces of HTTP threads:
> Stacktrace:
> {noformat}
> "qtp2079179914-1055393" Id=1055393 RUNNABLE
>       at 
> org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(ResponseWriter.java:291)
>       at 
> org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:215)
>       at java.io.FilterOutputStream.write(FilterOutputStream.java:77)
>       at java.io.FilterOutputStream.write(FilterOutputStream.java:125)
>       at 
> org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:276)
>       at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1310)
>       at org.apache.commons.io.IOUtils.copy(IOUtils.java:978)
>       at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1282)
>       at 
> org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.lambda$get$0(ObjectEndpoint.java:382)
> {noformat}
> JFR:
> {noformat}
> Stack Trace   Count   Percentage
> void org.eclipse.jetty.server.HttpOutput.write(int)   431146  39 %
> void 
> org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(int)
>   431145  39 %
> void org.glassfish.jersey.message.internal.CommittingOutputStream.write(int)  
> 431145  39 %
> void java.io.FilterOutputStream.write(int)    431145  39 %
> void java.io.FilterOutputStream.write(byte[], int, int)       431145  39 %
> void 
> org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(byte[],
>  int, int)  431145  39 %
> long org.apache.commons.io.IOUtils.copyLarge(InputStream, OutputStream, 
> byte[])       431145  39 %
> {noformat}
> We can clearly see the transition {{FilterOutputStream.write(byte[], int, 
> int) -> FilterOutputStream.write(int)}}, meaning that any incoming array is 
> written as single bytes, not as an array as a whole.
> The place in the code that creates {{FilterOutputStream}} is 
> {{org.apache.hadoop.ozone.s3.TracingFilter}}:
> {code}
>     OutputStream out = responseContext.getEntityStream();
>     if (out != null) {
>       responseContext.setEntityStream(new FilterOutputStream(out) {
>         @Override
>         public void close() throws IOException {
>           super.close();
>           finishAndClose(scope, span);
>         }
>       });
>     }
> {code}
> Removing this filter or fixing {{FilterOutputStream.write(byte[], int, int)}} 
> method resolves performance issues and we see a 5x better throughput and CPU 
> around 12%.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to