Hi Val This is precisely my point: it's only a minor optimization until the point when establishing each connection takes 3-4 seconds, and we establish 32 of them in sequence. At that point it becomes a serious issue: the customer cannot run SQL queries from their development machines without them timing out once out of every two or three runs. These kind of problems undermine confidence in Ignite.
Mike -----Original Message----- From: Valentin Kulichenko [mailto:valentin.kuliche...@gmail.com] Sent: 22 May 2017 19:15 To: dev@ignite.apache.org Subject: Re: Inefficient approach to executing remote SQL queries Hi Mike, Generally, establishing connections in parallel could make sense, but note that in most this would be a minor optimization, because: - Under load connections are established once and then reused. If you observe disconnections during application lifetime under load, then probably this should be addressed first. - Actual communication is asynchronous, we use NIO for this. If connection already exists, sendGeneric() basically just puts a message into a queue. -Val On Mon, May 22, 2017 at 7:04 PM, Michael Griggs <michael.gri...@gridgain.com > wrote: > Hi Igniters, > > > > Whilst diagnosing a problem with a slow query, I became aware of a > potential issue in the Ignite codebase. When executing a SQL query > that is to run remotely, the IgniteH2Indexing#send() method is called, > with a Collection<ClusterNode> as one of its parameters. This > collection is iterated sequentially, and ctx.io().sendGeneric() is > called synchronously for each node. This is inefficient if > > > > a) This is the first execution of a query, and thus TCP connections > have to be established > > b) The cost of establishing a TCP connection is high > > > > And optionally > > > > c) There are a large number of nodes in the cluster > > > > In my current situation, developers want to run test queries from > their code running locally, but connected via VPN to their UAT server > environment. > The > cost of opening a TCP connection is in the multiple seconds, as you > can see from this Ignite log file snippet: > > 2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:56924, > rmtAddr=/10.132.80.3:47100] > > 2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:56923, > rmtAddr=/10.132.80.30:47102] > > 2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:56971, > rmtAddr=/10.132.80.23:47101] > > 2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:56972, > rmtAddr=/10.132.80.21:47100] > > 2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:56973, > rmtAddr=/10.132.80.21:47103] > > 2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:57020, > rmtAddr=/10.132.80.20:47100] > > 2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:57021, > rmtAddr=/10.132.80.29:47103] > > 2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:57022, > rmtAddr=/10.132.80.22:47103] > > 2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:57024, > rmtAddr=/10.132.80.20:47101] > > 2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/7.1.14.242:57025, > rmtAddr=/10.132.80.30:47103] > > > > Comparing the same code that is executed inside of the UAT environment > (so not using the VPN): > > 2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.175.11.38:53288, > rmtAddr=/10.175.11.58:47100] > > 2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.175.11.38:45890, > rmtAddr=/10.175.11.54:47101] > > 2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/127.0.0.1:47582, > rmtAddr=/127.0.0.1:47100] > > 2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/127.0.0.1:45240, > rmtAddr=/127.0.0.1:47103] > > 2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.175.11.38:46280, > rmtAddr=/10.175.11.15:47100] > > 2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.132.80.21:51476, > rmtAddr=/10.132.80.29:47103] > > 2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.132.80.21:56274, > rmtAddr=pocfd-master1/10.132.80.22:47103] > > 2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.132.80.21:53558, > rmtAddr=pocfd-ignite1/10.132.80.20:47101] > > 2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established > outgoing communication connection [locAddr=/10.132.80.21:56216, > rmtAddr=/10.132.80.30:47103] > > > > This is a design flaw in the Ignite code, as we are relying on the > client's network behaving in a particular way (i.e., port opening being very > fast). > We should instead try to mask this potential slowness by establishing > connections in parallel, and waiting on the results. > > > > I would like to hear others thoughts and comment before we open a JIRA > to look at this. > > > > Regards > > Mike > >