Hi Igniters,

 

Whilst diagnosing a problem with a slow query, I became aware of a potential
issue in the Ignite codebase.  When executing a SQL query that is to run
remotely, the IgniteH2Indexing#send() method is called, with a
Collection<ClusterNode> as one of its parameters.  This collection is
iterated sequentially, and ctx.io().sendGeneric() is called synchronously
for each node.  This is inefficient if



a)       This is the first execution of a query, and thus TCP connections
have to be established

b)      The cost of establishing a TCP connection is high



And optionally

 

c)       There are a large number of nodes in the cluster

 

In my current situation, developers want to run test queries from their code
running locally, but connected via VPN to their UAT server environment.  The
cost of opening a TCP connection is in the multiple seconds, as you can see
from this Ignite log file snippet:

2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/7.1.14.242:56924,
rmtAddr=/10.132.80.3:47100]

2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/7.1.14.242:56923,
rmtAddr=/10.132.80.30:47102]

2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/7.1.14.242:56971,
rmtAddr=/10.132.80.23:47101]

2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/7.1.14.242:56972,
rmtAddr=/10.132.80.21:47100]

2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/7.1.14.242:56973,
rmtAddr=/10.132.80.21:47103]

2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/7.1.14.242:57020,
rmtAddr=/10.132.80.20:47100]

2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/7.1.14.242:57021,
rmtAddr=/10.132.80.29:47103]

2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/7.1.14.242:57022,
rmtAddr=/10.132.80.22:47103]

2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/7.1.14.242:57024,
rmtAddr=/10.132.80.20:47101]

2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/7.1.14.242:57025,
rmtAddr=/10.132.80.30:47103]

 

Comparing the same code that is executed inside of the UAT environment (so
not using the VPN):

2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/10.175.11.38:53288,
rmtAddr=/10.175.11.58:47100]

2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/10.175.11.38:45890,
rmtAddr=/10.175.11.54:47101]

2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/127.0.0.1:47582,
rmtAddr=/127.0.0.1:47100]

2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/127.0.0.1:45240,
rmtAddr=/127.0.0.1:47103]

2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/10.175.11.38:46280,
rmtAddr=/10.175.11.15:47100]

2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/10.132.80.21:51476,
rmtAddr=/10.132.80.29:47103]

2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/10.132.80.21:56274,
rmtAddr=pocfd-master1/10.132.80.22:47103]

2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/10.132.80.21:53558,
rmtAddr=pocfd-ignite1/10.132.80.20:47101]

2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established outgoing
communication connection [locAddr=/10.132.80.21:56216,
rmtAddr=/10.132.80.30:47103]

 

This is a design flaw in the Ignite code, as we are relying on the client's
network behaving in a particular way (i.e., port opening being very fast).
We should instead try to mask this potential slowness by establishing
connections in parallel, and waiting on the results.

 

I would like to hear others thoughts and comment before we open a JIRA to
look at this.

 

Regards

Mike

Reply via email to