[ 
https://issues.apache.org/jira/browse/IGNITE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yakov Zhdanov updated IGNITE-3220:
----------------------------------
    Description: 
Steps to reproduce:
# start 1 server and 1 client nodes in single JVM. Do cache.put(randomKey) from 
16 threads. Get N ops/sec.
# start 1 server and 2 client nodes in single JVM. Do cache.put(randomKey) from 
16 threads picking up random client of 2. Get 2*N ops/sec.

One of the possible reasons of this is the IO approach. Currently, all direct 
marshallable messages are marshalled and unmarshalled in a single NIO thread 
which is in charge for the connection this message goes to or comes from.

Possible fix may be:
# move direct marshalling from user & system threads to make it parallel
## after all user objects are marshalled, direct marshallable message should be 
able to provide the size of the resulting message 
## communication should allocate rather big direct byte buffer per connection
## user or system thread that wants to send message should request a chunk (256 
or 512) for writing direct message to
## thread can request another chunk from communication (communication should 
try allocating chunk next to already allocated), if chunk cannot be expanded 
thread may allocate buffer locally and finish marshalling to it.
## set of buffers can be written to sock channel with 
{{java.nio.channels.SocketChannel#write(java.nio.ByteBuffer[], int, int)}}
## amount of info written to the soket should be even to some value, e.g. 256 
or 512 bytes. Free space should be sent as well.
## each chunk (i.e. 256 or 512 bytes) written to socket should have local 
thread ID in the very beginning.
# move direct UN-marshalling from NIO hreads to make it parallel
## data is read by chunks, first thread ID of the chunk should be analyzed and 
chunk should be submitted to striped thread pool for unmarshalling
## after message is unmarshalled it gets processed in the way it is done now.

NOTE: As a further idea we can try to do user object marshalling directly to 
per-connection buffer and switch thread to process another message until chunk 
is flushed (similar to job continuation approach).

  was:
Where is an I/O bottleneck if a client tries to perform many
transactions involving puts and gets in a highly concurrent manner using socket 
transport to a single server node deployed on powerful multicore computer.
In what case throughput dramatically decreases (up to 30 times in comparison 
with running test in single JVM) because everything is delayed by server I/O 
thread.

Current workaround for such scenario is using more clients (or servers).
We should add configuration parameter such maxConnectionsPerClient, allowing 
client to connect to server using several simultaneous connections decreasing 
I/O bottleneck.


> I/O bottleneck on server/client cluster configuration
> -----------------------------------------------------
>
>                 Key: IGNITE-3220
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3220
>             Project: Ignite
>          Issue Type: Bug
>          Components: clients
>            Reporter: Alexei Scherbakov
>            Assignee: Yakov Zhdanov
>            Priority: Critical
>              Labels: performance
>             Fix For: 1.7
>
>
> Steps to reproduce:
> # start 1 server and 1 client nodes in single JVM. Do cache.put(randomKey) 
> from 16 threads. Get N ops/sec.
> # start 1 server and 2 client nodes in single JVM. Do cache.put(randomKey) 
> from 16 threads picking up random client of 2. Get 2*N ops/sec.
> One of the possible reasons of this is the IO approach. Currently, all direct 
> marshallable messages are marshalled and unmarshalled in a single NIO thread 
> which is in charge for the connection this message goes to or comes from.
> Possible fix may be:
> # move direct marshalling from user & system threads to make it parallel
> ## after all user objects are marshalled, direct marshallable message should 
> be able to provide the size of the resulting message 
> ## communication should allocate rather big direct byte buffer per connection
> ## user or system thread that wants to send message should request a chunk 
> (256 or 512) for writing direct message to
> ## thread can request another chunk from communication (communication should 
> try allocating chunk next to already allocated), if chunk cannot be expanded 
> thread may allocate buffer locally and finish marshalling to it.
> ## set of buffers can be written to sock channel with 
> {{java.nio.channels.SocketChannel#write(java.nio.ByteBuffer[], int, int)}}
> ## amount of info written to the soket should be even to some value, e.g. 256 
> or 512 bytes. Free space should be sent as well.
> ## each chunk (i.e. 256 or 512 bytes) written to socket should have local 
> thread ID in the very beginning.
> # move direct UN-marshalling from NIO hreads to make it parallel
> ## data is read by chunks, first thread ID of the chunk should be analyzed 
> and chunk should be submitted to striped thread pool for unmarshalling
> ## after message is unmarshalled it gets processed in the way it is done now.
> NOTE: As a further idea we can try to do user object marshalling directly to 
> per-connection buffer and switch thread to process another message until 
> chunk is flushed (similar to job continuation approach).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to