The problem here is with the initial opening of connections. With a client who connects and disconnects quickly, and frequently, a 30-second plus connection time is not workable.
Mike On 23 May 2017 6:51 pm, "Dmitriy Setrakyan" <dsetrak...@apache.org> wrote: > Why do we turn off the connections, once established? Why not keep them > open, until an endpoint explicitly closes them? > > On Tue, May 23, 2017 at 2:16 AM, Sergi Vladykin <sergi.vlady...@gmail.com> > wrote: > > > Michael, > > > > I see your point. I think it must not be too hard to start asynchronously > > establishing connections to all the needed nodes. > > > > I've created respective issue in Jira: > > https://issues.apache.org/jira/browse/IGNITE-5277 > > > > Sergi > > > > 2017-05-23 11:56 GMT+03:00 Michael Griggs <michael.gri...@gridgain.com>: > > > > > Hi Val > > > > > > This is precisely my point: it's only a minor optimization until the > > point > > > when establishing each connection takes 3-4 seconds, and we establish > 32 > > of > > > them in sequence. At that point it becomes a serious issue: the > customer > > > cannot run SQL queries from their development machines without them > > timing > > > out once out of every two or three runs. These kind of problems > > undermine > > > confidence in Ignite. > > > > > > Mike > > > > > > > > > -----Original Message----- > > > From: Valentin Kulichenko [mailto:valentin.kuliche...@gmail.com] > > > Sent: 22 May 2017 19:15 > > > To: dev@ignite.apache.org > > > Subject: Re: Inefficient approach to executing remote SQL queries > > > > > > Hi Mike, > > > > > > Generally, establishing connections in parallel could make sense, but > > note > > > that in most this would be a minor optimization, because: > > > > > > - Under load connections are established once and then reused. If > you > > > observe disconnections during application lifetime under load, then > > > probably this should be addressed first. > > > - Actual communication is asynchronous, we use NIO for this. If > > > connection already exists, sendGeneric() basically just puts a > message > > > into > > > a queue. > > > > > > -Val > > > > > > On Mon, May 22, 2017 at 7:04 PM, Michael Griggs < > > > michael.gri...@gridgain.com > > > > wrote: > > > > > > > Hi Igniters, > > > > > > > > > > > > > > > > Whilst diagnosing a problem with a slow query, I became aware of a > > > > potential issue in the Ignite codebase. When executing a SQL query > > > > that is to run remotely, the IgniteH2Indexing#send() method is > called, > > > > with a Collection<ClusterNode> as one of its parameters. This > > > > collection is iterated sequentially, and ctx.io().sendGeneric() is > > > > called synchronously for each node. This is inefficient if > > > > > > > > > > > > > > > > a) This is the first execution of a query, and thus TCP > > connections > > > > have to be established > > > > > > > > b) The cost of establishing a TCP connection is high > > > > > > > > > > > > > > > > And optionally > > > > > > > > > > > > > > > > c) There are a large number of nodes in the cluster > > > > > > > > > > > > > > > > In my current situation, developers want to run test queries from > > > > their code running locally, but connected via VPN to their UAT server > > > > environment. > > > > The > > > > cost of opening a TCP connection is in the multiple seconds, as you > > > > can see from this Ignite log file snippet: > > > > > > > > 2017-05-22 18:29:48,908 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:56924, > > > > rmtAddr=/10.132.80.3:47100] > > > > > > > > 2017-05-22 18:29:52,294 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:56923, > > > > rmtAddr=/10.132.80.30:47102] > > > > > > > > 2017-05-22 18:29:58,659 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:56971, > > > > rmtAddr=/10.132.80.23:47101] > > > > > > > > 2017-05-22 18:30:03,183 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:56972, > > > > rmtAddr=/10.132.80.21:47100] > > > > > > > > 2017-05-22 18:30:06,039 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:56973, > > > > rmtAddr=/10.132.80.21:47103] > > > > > > > > 2017-05-22 18:30:10,828 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:57020, > > > > rmtAddr=/10.132.80.20:47100] > > > > > > > > 2017-05-22 18:30:13,060 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:57021, > > > > rmtAddr=/10.132.80.29:47103] > > > > > > > > 2017-05-22 18:30:22,144 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:57022, > > > > rmtAddr=/10.132.80.22:47103] > > > > > > > > 2017-05-22 18:30:26,513 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:57024, > > > > rmtAddr=/10.132.80.20:47101] > > > > > > > > 2017-05-22 18:30:28,526 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/7.1.14.242:57025, > > > > rmtAddr=/10.132.80.30:47103] > > > > > > > > > > > > > > > > Comparing the same code that is executed inside of the UAT > environment > > > > (so not using the VPN): > > > > > > > > 2017-05-22 18:22:18,102 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.175.11.38:53288, > > > > rmtAddr=/10.175.11.58:47100] > > > > > > > > 2017-05-22 18:22:18,105 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.175.11.38:45890, > > > > rmtAddr=/10.175.11.54:47101] > > > > > > > > 2017-05-22 18:22:18,108 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/127.0.0.1:47582, > > > > rmtAddr=/127.0.0.1:47100] > > > > > > > > 2017-05-22 18:22:18,111 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/127.0.0.1:45240, > > > > rmtAddr=/127.0.0.1:47103] > > > > > > > > 2017-05-22 18:22:18,114 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.175.11.38:46280, > > > > rmtAddr=/10.175.11.15:47100] > > > > > > > > 2017-05-22 18:22:18,118 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.132.80.21:51476, > > > > rmtAddr=/10.132.80.29:47103] > > > > > > > > 2017-05-22 18:22:18,120 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.132.80.21:56274, > > > > rmtAddr=pocfd-master1/10.132.80.22:47103] > > > > > > > > 2017-05-22 18:22:18,124 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.132.80.21:53558, > > > > rmtAddr=pocfd-ignite1/10.132.80.20:47101] > > > > > > > > 2017-05-22 18:22:18,127 INFO [TcpCommunicationSpi] - Established > > > > outgoing communication connection [locAddr=/10.132.80.21:56216, > > > > rmtAddr=/10.132.80.30:47103] > > > > > > > > > > > > > > > > This is a design flaw in the Ignite code, as we are relying on the > > > > client's network behaving in a particular way (i.e., port opening > being > > > very fast). > > > > We should instead try to mask this potential slowness by establishing > > > > connections in parallel, and waiting on the results. > > > > > > > > > > > > > > > > I would like to hear others thoughts and comment before we open a > JIRA > > > > to look at this. > > > > > > > > > > > > > > > > Regards > > > > > > > > Mike > > > > > > > > > > > > > > > > >