Hello Nikolay, Your proposal sounds reasonable. However, I would suggest us to wait while partition-awareness is supported for Java thin client first. With that feature, the client can connect to any node directly while presently all the communication goes through a proxy (a node the client is connected to). All of that is bad for performance.
Vladimir, how hard would it be to support the partition-awareness for Java client? Probably, Nikolay can take over. -- Denis On Sat, Oct 20, 2018 at 2:09 PM Nikolay Izhikov <nizhi...@apache.org> wrote: > Hello, Igniters. > > Currently, Spark Data Frame integration implemented via client node > connection. > Whenever we need to retrieve some data into Spark worker(or master) from > Ignite we start a client node. > > It has several major disadvantages: > > 1. We should copy whole Ignite distribution on to each Spark > worker [1] > 2. We should copy whole Ignite distribution on to Spark master to > get catalogue works. > 3. We should have the same absolute path to Ignite configuration > file on every worker and provide it during data frame construction [2] > 4. We should additionally configure Spark workerks classpath to > include Ignite libraries. > > For now, almost all operation we need to do in Spark Data Frame > integration is supported by Java Thin Client. > * obtain the list of caches. > * get cache configuration. > * execute SQL query. > * stream data to the table - don't support by the thin client for > now, but can be implemented using simple SQL INSERT statements. > > Advantages of usage Java Thin Client in Spark integration(they all known > from Java Thin Client advantages): > 1. Easy to configure: only IP addresses of server nodes are > required. > 2. Easy to deploy: only 1 additional jar required. No server > side(Ignite worker) configuration required. > > I propose to implement Spark Data Frame integration through Java Thin > Client. > > Thoughts? > > [1] https://apacheignite-fs.readme.io/docs/installation-deployment > [2] > https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-ignite-dataframe-options >