[GitHub] [hadoop] huxinqiu commented on pull request #5151: HADOOP-18533. RPC Client performance improvement

GitBox Mon, 21 Nov 2022 01:13:45 -0800


huxinqiu commented on PR #5151:
URL: https://github.com/apache/hadoop/pull/5151#issuecomment-1321736995


   > @huxinqiu Thank you very much for your contribution!
   > 
   > We need to discuss something:
   > 
   > 1. It seems that the benefit is to avoid declaring this variable 
`ResponseBuffer`, bringing the initialized 1024 byte.
   >    Then we moved the original internal calculation code directly to the 
outside.
   > 
   > > modified code
   > 
   > ```
   > int computedSize = connectionContextHeader.getSerializedSize();
   >     computedSize += CodedOutputStream.computeUInt32SizeNoTag(computedSize);
   >     int messageSize = message.getSerializedSize();
   >     computedSize += messageSize;
   >     computedSize += CodedOutputStream.computeUInt32SizeNoTag(messageSize);
   >     byte[] dataLengthBuffer = new byte[4];
   >     dataLengthBuffer[0] = (byte)((computedSize >>> 24) & 0xFF);
   >     dataLengthBuffer[1] = (byte)((computedSize >>> 16) & 0xFF);
   >     dataLengthBuffer[2] = (byte)((computedSize >>>  8) & 0xFF);
   >     dataLengthBuffer[3] = (byte)(computedSize & 0xFF);
   > ```
   > 
   > > The original calculation code is like this 
connectionContextHeader.writeDelimitedTo(buf)
   > 
   > ```
   > int serialized = this.getSerializedSize();
   > int bufferSize = 
CodedOutputStream.computePreferredBufferSize(CodedOutputStream.computeRawVarint32Size(serialized)
 + serialized);
   > CodedOutputStream codedOutput = CodedOutputStream.newInstance(output, 
bufferSize);
   > codedOutput.writeRawVarint32(serialized);
   > this.writeTo(codedOutput);
   > codedOutput.flush();
   > ```
   > 
   > > ResponseBuffer#setSize
   > 
   > ```
   > @Override
   >     public int size() {
   >       return count - FRAMING_BYTES;
   >     }
   >     void setSize(int size) {
   >       buf[0] = (byte)((size >>> 24) & 0xFF);
   >       buf[1] = (byte)((size >>> 16) & 0xFF);
   >       buf[2] = (byte)((size >>>  8) & 0xFF);
   >       buf[3] = (byte)((size >>>  0) & 0xFF);
   >     }
   > ```
   > 
   > 2. code duplication
   >    The following calculation logic appears 3 times
   > 
   > ```
   > this.dataLengthBuffer = new byte[4];
   >       dataLengthBuffer[0] = (byte)((computedSize >>> 24) & 0xFF);
   >       dataLengthBuffer[1] = (byte)((computedSize >>> 16) & 0xFF);
   >       dataLengthBuffer[2] = (byte)((computedSize >>>  8) & 0xFF);
   >       dataLengthBuffer[3] = (byte)(computedSize & 0xFF);
   >       this.header = header;
   >       this.rpcRequest = rpcRequest;
   > ```
   > 
   > RpcProtobufRequestWithHeader#Constructor SaslRpcClient#sendSaslMessage 
Client#writeConnectionContext
   @slfan1989 
     1. Yes, IpcStreams#out is a BufferedOutputStream, which has a byte array 
inside it, and protobuf's CodedOutputStream also has a byte array cache inside 
to optimize writing, we don't need to aggregate dataLength, header and 
rpcRequest into a ResponseBuffer which is actually a byte array. 
     The only extra performance cost in the RpcRequestSender thread is the 
serialization of protobuf, usually the request size is only a few hundred 
bytes, and the serialization will only cost tens of microseconds. So I think it 
is better to first calculate the request size, and then write the dataLength, 
header and rpcRequest to the BufferedOutputStream one by one, so as to avoid 
requesting a 1024-byte array for each request.
     2.I'll fix these code duplication afterwards


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[GitHub] [hadoop] huxinqiu commented on pull request #5151: HADOOP-18533. RPC Client performance improvement

Reply via email to