Hi Igniters,
Whilst diagnosing a problem with a slow query, I became aware of a potential issue in the Ignite codebase. When executing a SQL query that is to run remotely, the IgniteH2Indexing#send() method is called, with a Collection<ClusterNode> as one of its parameters. This collection is iterated sequentially, and ctx.io().sendGeneric() is called synchronously for each node. This is inefficient if a) This is the first execution of a query, and thus TCP connections have to be established b) The cost of establishing a TCP connection is high And optionally c) There are a large number of nodes in the cluster In my current situation, developers want to run test queries from their code running locally, but connected via VPN to their UAT server environment. The cost of opening a TCP connection is in the multiple seconds, as you can see from this Ignite log file snippet: 2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:56924, rmtAddr=/10.132.80.3:47100] 2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:56923, rmtAddr=/10.132.80.30:47102] 2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:56971, rmtAddr=/10.132.80.23:47101] 2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:56972, rmtAddr=/10.132.80.21:47100] 2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:56973, rmtAddr=/10.132.80.21:47103] 2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:57020, rmtAddr=/10.132.80.20:47100] 2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:57021, rmtAddr=/10.132.80.29:47103] 2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:57022, rmtAddr=/10.132.80.22:47103] 2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:57024, rmtAddr=/10.132.80.20:47101] 2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/7.1.14.242:57025, rmtAddr=/10.132.80.30:47103] Comparing the same code that is executed inside of the UAT environment (so not using the VPN): 2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.175.11.38:53288, rmtAddr=/10.175.11.58:47100] 2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.175.11.38:45890, rmtAddr=/10.175.11.54:47101] 2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/127.0.0.1:47582, rmtAddr=/127.0.0.1:47100] 2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/127.0.0.1:45240, rmtAddr=/127.0.0.1:47103] 2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.175.11.38:46280, rmtAddr=/10.175.11.15:47100] 2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.132.80.21:51476, rmtAddr=/10.132.80.29:47103] 2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.132.80.21:56274, rmtAddr=pocfd-master1/10.132.80.22:47103] 2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.132.80.21:53558, rmtAddr=pocfd-ignite1/10.132.80.20:47101] 2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established outgoing communication connection [locAddr=/10.132.80.21:56216, rmtAddr=/10.132.80.30:47103] This is a design flaw in the Ignite code, as we are relying on the client's network behaving in a particular way (i.e., port opening being very fast). We should instead try to mask this potential slowness by establishing connections in parallel, and waiting on the results. I would like to hear others thoughts and comment before we open a JIRA to look at this. Regards Mike