[
https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985209#action_12985209
]
Matt Foley commented on HDFS-1583:
----------------------------------
A patch with an alternative solution is attached to issue #HADOOP-6949. This
is a generalized fix for any protocol that pumps arrays of primitives (long[],
byte[], etc.) through the RPC and ObjectWritable mechanisms, including block
reports and journaling.
It has the following benefits:
1. it is fully backward compatible with any persisted data that may be
out there, because it can still successfully receive/read the old format
2. it has minimal changes to the ObjectWritable module, and doesn't
require any changes to protocols or APIs
3. it is consistent with current practice regarding *Writable classes
4. It automatically benefits all protocols that send arrays of
primitives through the RPC and ObjectWritable mechanisms, without changes to
the protocol APIs
5. It has fully optimized wire format, with almost no object overhead,
unlike solutions based on ArrayWritable.
That said, while it is backward compatible with old stored data, it is
admittedly not backward compatible with older versions of Client software: A
NEWER RPC RECEIVER can correctly receive data from an OLDER SENDER, but an
OLDER RECEIVER will not correctly receive arrays of primitives sent from a
NEWER SENDER. I haven't been able to think of a solution without getting into
protocol negotiation schemes.
I understand that Liyin's proposed solution, by adding new signatures to the
journaling protocol, could have the benefit of backward compatibility with old
Clients. If that is critical, then we must go that way. But that way only
benefits the specific APIs that get modified, and adds significant maintenance
complexity in the long run. I would also point out that Liyin's current patch
does NOT provide that backward compatibility, because it replaces the protocol
signatures rather than extending them. If the current patch is acceptable,
meaning we don't need Client API backward compatibility, then I strongly
recommend the more general solution of HADOOP-6949 instead.
As far as performance goes, I measured the RPC overhead time in sending a
50,000-block Block Report in the form of an array of long[150000]. (That's the
problem that got me working on this.) The current RPC round-trip overhead in a
system with no resource contention is about 180 msec. With my proposed patch,
and no code change to the calling code, this is reduced to only 50 msec, i.e.
an 80% reduction. The size reduction is comparable to what Liyin measured, for
similar reasons.
> Improve backup-node sync performance by wrapping RPC parameters
> ---------------------------------------------------------------
>
> Key: HDFS-1583
> URL: https://issues.apache.org/jira/browse/HDFS-1583
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: name-node
> Reporter: Liyin Liang
> Assignee: Liyin Liang
> Fix For: 0.23.0
>
> Attachments: HDFS-1583-1.patch, HDFS-1583-2.patch, test-rpc.diff
>
>
> The journal edit records are sent by the active name-node to the backup-node
> with RPC:
> {code:}
> public void journal(NamenodeRegistration registration,
> int jAction,
> int length,
> byte[] records) throws IOException;
> {code}
> During the name-node throughput benchmark, the size of byte array _records_
> is around *8000*. Then the serialization and deserialization is
> time-consuming. I wrote a simple application to test RPC with byte array
> parameter. When the size got to 8000, each RPC call need about 6 ms. While
> name-node sync 8k byte to local disk only need 0.3~0.4ms.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.