Michael, I see your point. I think it must not be too hard to start asynchronously establishing connections to all the needed nodes.
I've created respective issue in Jira: https://issues.apache.org/jira/browse/IGNITE-5277 Sergi 2017-05-23 11:56 GMT+03:00 Michael Griggs <michael.gri...@gridgain.com>: > Hi Val > > This is precisely my point: it's only a minor optimization until the point > when establishing each connection takes 3-4 seconds, and we establish 32 of > them in sequence. At that point it becomes a serious issue: the customer > cannot run SQL queries from their development machines without them timing > out once out of every two or three runs. These kind of problems undermine > confidence in Ignite. > > Mike > > > -----Original Message----- > From: Valentin Kulichenko [mailto:valentin.kuliche...@gmail.com] > Sent: 22 May 2017 19:15 > To: dev@ignite.apache.org > Subject: Re: Inefficient approach to executing remote SQL queries > > Hi Mike, > > Generally, establishing connections in parallel could make sense, but note > that in most this would be a minor optimization, because: > > - Under load connections are established once and then reused. If you > observe disconnections during application lifetime under load, then > probably this should be addressed first. > - Actual communication is asynchronous, we use NIO for this. If > connection already exists, sendGeneric() basically just puts a message > into > a queue. > > -Val > > On Mon, May 22, 2017 at 7:04 PM, Michael Griggs < > michael.gri...@gridgain.com > > wrote: > > > Hi Igniters, > > > > > > > > Whilst diagnosing a problem with a slow query, I became aware of a > > potential issue in the Ignite codebase. When executing a SQL query > > that is to run remotely, the IgniteH2Indexing#send() method is called, > > with a Collection<ClusterNode> as one of its parameters. This > > collection is iterated sequentially, and ctx.io().sendGeneric() is > > called synchronously for each node. This is inefficient if > > > > > > > > a) This is the first execution of a query, and thus TCP connections > > have to be established > > > > b) The cost of establishing a TCP connection is high > > > > > > > > And optionally > > > > > > > > c) There are a large number of nodes in the cluster > > > > > > > > In my current situation, developers want to run test queries from > > their code running locally, but connected via VPN to their UAT server > > environment. > > The > > cost of opening a TCP connection is in the multiple seconds, as you > > can see from this Ignite log file snippet: > > > > 2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:56924, > > rmtAddr=/10.132.80.3:47100] > > > > 2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:56923, > > rmtAddr=/10.132.80.30:47102] > > > > 2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:56971, > > rmtAddr=/10.132.80.23:47101] > > > > 2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:56972, > > rmtAddr=/10.132.80.21:47100] > > > > 2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:56973, > > rmtAddr=/10.132.80.21:47103] > > > > 2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:57020, > > rmtAddr=/10.132.80.20:47100] > > > > 2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:57021, > > rmtAddr=/10.132.80.29:47103] > > > > 2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:57022, > > rmtAddr=/10.132.80.22:47103] > > > > 2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:57024, > > rmtAddr=/10.132.80.20:47101] > > > > 2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/7.1.14.242:57025, > > rmtAddr=/10.132.80.30:47103] > > > > > > > > Comparing the same code that is executed inside of the UAT environment > > (so not using the VPN): > > > > 2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.175.11.38:53288, > > rmtAddr=/10.175.11.58:47100] > > > > 2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.175.11.38:45890, > > rmtAddr=/10.175.11.54:47101] > > > > 2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/127.0.0.1:47582, > > rmtAddr=/127.0.0.1:47100] > > > > 2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/127.0.0.1:45240, > > rmtAddr=/127.0.0.1:47103] > > > > 2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.175.11.38:46280, > > rmtAddr=/10.175.11.15:47100] > > > > 2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.132.80.21:51476, > > rmtAddr=/10.132.80.29:47103] > > > > 2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.132.80.21:56274, > > rmtAddr=pocfd-master1/10.132.80.22:47103] > > > > 2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.132.80.21:53558, > > rmtAddr=pocfd-ignite1/10.132.80.20:47101] > > > > 2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established > > outgoing communication connection [locAddr=/10.132.80.21:56216, > > rmtAddr=/10.132.80.30:47103] > > > > > > > > This is a design flaw in the Ignite code, as we are relying on the > > client's network behaving in a particular way (i.e., port opening being > very fast). > > We should instead try to mask this potential slowness by establishing > > connections in parallel, and waiting on the results. > > > > > > > > I would like to hear others thoughts and comment before we open a JIRA > > to look at this. > > > > > > > > Regards > > > > Mike > > > > > >