Feng Jiang wrote:
As for the IPC (it used to be RPC about one year ago) implementation, I think it has some performance problem. I don't know why the Listener has to read the data and prepare the Call instance then put the Call instance into a queue. The reading process may be a long time, and other Calls are blocked. I think that is the reason why you have to catch the OutOfMemoryError.
I don't completely understand your concern. The queue is used to limit the number of requests that will be processed simultaneously by worker threads. A single thread reads requests off the socket and queues them for worker threads to process and write results. Unless we break requests and responses into chunks and re-assemble them, the reading and writing of individual requests and responses must block other requests and responses on that connection.
Previously there was server thread per connection, reading requests. (Responses are written directly by worker threads.) Recently this has been re-written so that a single thread uses nio selectors to read requests from all connections. Calls are still queued for worker threads to process and reply.
I used to implemented another RPC Server and Client based on your code, by simply using java's ThreadPool, the performance is about 10-20% higher than that implementation one years ago. The main algorithm is: 1. Server has a threadpool, initially it is small and grows as need. 2. Listen accept a connection, then put the connection into threadpool. 3. The threads in the pool handle everything for the connections.
This does not permit a slow request from a given client to be passed by a fast request from that client, i.e., out-of-order responses. The IPC protocol was originally designed for search, where some queries take significantly longer than others. All front-ends send each query to all backends. In this case we do not want queries which are fast to process to be stuck behind those which are slow to process.
Permitting out-of-order responses does increase the base latency a bit, since a thread switch is required between reading the request and writing the response. Perhaps that accounts for the 10-20% difference you saw. It is possible to combine the approaches. Worker threads could use synchronization to directly listen on the socket and process requests as soon as they are complete. That would avoid the thread switch, but I think it would be complicated to develop.
by this way, it saves memory usage and threadpool's schedule should be better than our own.
We are using a thread pool for worker threads. The difference is that worker threads call wait() to wait for a request, and the listener thread call notify() to tell the worker thread that a request is ready, and the scheduler must switch from running the listener to the worker.
Doug