[jira] [Created] (HDFS-9663) Optimize some RPC call using lighter weight construct than DatanodeInfo

Kai Zheng (JIRA) Tue, 19 Jan 2016 04:48:14 -0800

Kai Zheng created HDFS-9663:
-------------------------------

             Summary: Optimize some RPC call using lighter weight construct 
than DatanodeInfo
                 Key: HDFS-9663
                 URL: https://issues.apache.org/jira/browse/HDFS-9663
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Kai Zheng
            Assignee: Kai Zheng



While working on HDFS-8430 when add a RPC in DataTransferProtocol, it was 
noticed the very heavy construct either {{DatanodeInfo}} or 
{{DatanodeInfoWithStorage}} is used to represent a datanode just for connection 
in most time. However, it's very fat and contains much more information than 
that needed. See how it's defined:
{code}
public class DatanodeInfo extends DatanodeID implements Node {
  private long capacity;
  private long dfsUsed;
  private long remaining;
  private long blockPoolUsed;
  private long cacheCapacity;
  private long cacheUsed;
  private long lastUpdate;
  private long lastUpdateMonotonic;
  private int xceiverCount;
  private String location = NetworkTopology.DEFAULT_RACK;
  private String softwareVersion;
  private List<String> dependentHostNames = new LinkedList<>();
  private String upgradeDomain;
...
{code}
In client and datanode sides, for RPC calls like 
{{DataTransferProtocol#writeBlock}}, looks like the information contained in 
{{DatanodeID}} is almost enough.

I did a quick hack that using a light weight construct like 
{{SimpleDatanodeInfo}} that simply extends DatanodeID (no other field added, 
but if whatever field needed, then just add it) and changed the 
DataTransferProtocol#writeBlock call. Manually checked many relevant tests it 
did work fine. How much network traffic saved, did a simple test with codes in 
{{Sender}}:
{code}
  private static void send(final DataOutputStream out, final Op opcode,
      final Message proto) throws IOException {
    LOG.trace("Sending DataTransferOp {}: {}",
        proto.getClass().getSimpleName(), proto);
    int before = out.size();
    op(out, opcode);
    proto.writeDelimitedTo(out);
    int after = out.size();
    System.out.println("XXXXXXXXXXXXXXXXX sent=" + (after - before));
    out.flush();
  }
{code}
Ran the test {{TestWriteRead#testWriteAndRead}}, the change can  save about 100 
bytes in most time for the call. The saving may be not so big because only 3 
datanodes are to send, but in situations like in {{BlockECRecoveryCommand}}, 
there can be 6+ 3 datanodes as targets and sources to send, the saving will be 
significant.

Hence, suggest use more light weight construct to represent a datanode in RPC 
calls when possible. Or other ideas to avoid unnecessary wire data size. This 
may make sense, as noted, there were some discussions in HDFS-8999 to save some 
datanodes bandwidth.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9663) Optimize some RPC call using lighter weight construct than DatanodeInfo

Reply via email to