Sanjay Radia wrote:
Will the RPC over HTTP be transparent so that that we can replace with a
different layer if needed?

Yes.

My worry was the separation of data and checksums; someone had mentioned
that one could do this over 2 RPCs - that is not transparent.

That was suggested as a possibility if we did not want to use RPC for data, but rather raw HTTP, e.g., with a separate URL per block. The zerocopy support built into most HTTP servers only supports entire responses from a single file, so if we wanted to take advantage of these zerocopy implementations we'd not use RPC for block access, but could use HTTP and hence share security, etc. Using raw HTTP for block access might also perform better, since it can use TCP flow control, rather than RPC call/response. In my microbenchmarks, RPC call/response was fast enough to easily saturate disks and networks, so that might be moot, although RPC call/response for file data may use more CPU than we'd like. With our own transport implementation we could get RPC call/response to use zerocopy for file data.

I assume that we
going to create a branch that moves the data transfer protocols to RPC and
test the performance and if it is good then we commit and move to RPC?

Yes. We obviously cannot change the file data transfer protocol without benchmarking. Ideally file data transfer can share as much as possible with other protocols. The most optimistic approach would be to use HTTP-based RPC call/response, so we ought to benchmark that. This was the purpose of my recently-reported microbenchmarks.

We also need to determine whether both TCP flow-control and zerocopy are critical to data file performance. If both are indeed critical, and HTTP proves sufficient for everything else, then we should consider using non-RPC HTTP for file data transfer, since it supports both zerocopy and TCP-based flow control, and the implementation of security, etc. could be shared. But, on the other hand, if HTTP is deemed inappropriate for security and we develop our own RPC transport that permits zerocopy, and TCP flow-control over entire blocks is not required, then we might use RPC for file data. What I'm hoping we can avoid is, as today, using different transports for different protocols, re-implementing security, connection pooling, async request processing, etc. for each, requiring separate configuration and ports for each, etc. But even that might be required. We don't know yet.

I think starting with HTTP as a hypothesis permits us to make progress without a lot of up-front investment.

Doug

Reply via email to