Hello, Stephen. I suggest thin client deployment as a second option together with existing integration that use Client Node.
> I’m thinking specifically about better support for Spark Streaming, where the > lack of continuous query support in thin clients removes a significant > optimisation option. It's very interesting. Can you share you thoughts? What can be improved in Spark integration? В Пн, 22/10/2018 в 10:22 +0100, Stephen Darlington пишет: > Are you suggesting making the Thin Client deployment an option or as a > replacement for the thick-client? If the latter, do we risk making future > desirable changes more difficult (or impossible)? I’m thinking specifically > about better support for Spark Streaming, where the lack of continuous query > support in thin clients removes a significant optimisation option. I’m sure > there are other use cases. > > Regards, > Stephen > > > On 21 Oct 2018, at 09:08, Nikolay Izhikov <nizhi...@apache.org> wrote: > > > > Valentin. > > > > Seems, You made several suggestions, which is not always true, from my > > point of view: > > > > 1. "We have access to Spark cluster installation to perform deployment > > steps" - this is not true in cloud or enterprise environment. > > > > 2. "Spark cluster is used only for Ignite integration". > > From what I know computational resources for big Spark cluster is divided > > by many business divisions. > > And it is not convenient to perform some deployment steps on this cluster. > > > > 3. "When Ignite + Spark are used in real production it's OK to have > > reasonable deployment overhead" > > What about developer who want to play with this integration? > > And want to do it quickly to see how it works in real life examples. > > Can we do his life much easier? > > > > > First of all, they will exist with thin client either. > > > > Spark have an ability to deploy jars on worker and add it to application > > tasks classpath. > > For 2.6 we must deploy 11 additional jars to start using Ignite. > > Please, see my example on the bottom of documentation page [1] > > > > Does cache-api-1.0.0.jar and h2-1.4.195.jar seems like obvious dependencies > > for Ignite integration for you? > > And for our users? :) > > > > Actually, list of dependencies will be changed in 2.7 - new version of > > jcache, new version of h2 > > So user should change it in code or perform additional deployment steps. > > > > It overkill for me. > > > > On the other hand - thin client requires only 1 jar. > > Moreover, thin client protocol have the backward compatibility. > > So thin client will perform correctly when Ignite cluster will be updated > > from 2.6 to 2.7. > > So, with Spark integration via thin client we will be able to update Ignite > > cluster and Spark integration separately. > > For now, we should do it in one big step. > > > > What do you think? > > > > [1] https://apacheignite-fs.readme.io/docs/installation-deployment > > > > В Сб, 20/10/2018 в 18:33 -0700, Valentin Kulichenko пишет: > > > Guys, > > > > > > From my experience, Ignite and Spark clusters typically run in the same > > > environment, which makes client node a more preferable option. Mainly, > > > because of performance. BTW, I doubt partition-awareness on thin client > > > will help either, because in dataframes we only run SQL queries and I > > > believe thin client will execute them through a proxy anyway. But correct > > > me if I’m wrong. > > > > > > Either way, it sounds like we just have usability issues with Ignite/Spark > > > integration. Why don’t we concentrate on fixing them then? For example, #3 > > > can be fixed by loading XML content on master and then distributing it to > > > workers, instead of loading on every worker independently. Then there are > > > certain procedures like deploying JARs, etc. First of all, they will exist > > > with thin client either. Second of all, I’m sure there are ways to > > > simplify > > > this procedures and make integration easier. My opinion is that working on > > > such improvements is going to add more value than another implementation > > > based on thin client. > > > > > > -Val > > > > > > On Sat, Oct 20, 2018 at 4:03 PM Denis Magda <dma...@apache.org> wrote: > > > > > > > Hello Nikolay, > > > > > > > > Your proposal sounds reasonable. However, I would suggest us to wait > > > > while > > > > partition-awareness is supported for Java thin client first. With that > > > > feature, the client can connect to any node directly while presently all > > > > the communication goes through a proxy (a node the client is connected > > > > to). > > > > All of that is bad for performance. > > > > > > > > > > > > Vladimir, how hard would it be to support the partition-awareness for > > > > Java > > > > client? Probably, Nikolay can take over. > > > > > > > > -- > > > > Denis > > > > > > > > > > > > On Sat, Oct 20, 2018 at 2:09 PM Nikolay Izhikov <nizhi...@apache.org> > > > > wrote: > > > > > > > > > Hello, Igniters. > > > > > > > > > > Currently, Spark Data Frame integration implemented via client node > > > > > connection. > > > > > Whenever we need to retrieve some data into Spark worker(or master) > > > > > from > > > > > Ignite we start a client node. > > > > > > > > > > It has several major disadvantages: > > > > > > > > > > 1. We should copy whole Ignite distribution on to each Spark > > > > > worker [1] > > > > > 2. We should copy whole Ignite distribution on to Spark master > > > > > to > > > > > get catalogue works. > > > > > 3. We should have the same absolute path to Ignite > > > > > configuration > > > > > file on every worker and provide it during data frame construction [2] > > > > > 4. We should additionally configure Spark workerks classpath to > > > > > include Ignite libraries. > > > > > > > > > > For now, almost all operation we need to do in Spark Data Frame > > > > > integration is supported by Java Thin Client. > > > > > * obtain the list of caches. > > > > > * get cache configuration. > > > > > * execute SQL query. > > > > > * stream data to the table - don't support by the thin client > > > > > for > > > > > now, but can be implemented using simple SQL INSERT statements. > > > > > > > > > > Advantages of usage Java Thin Client in Spark integration(they all > > > > > known > > > > > from Java Thin Client advantages): > > > > > 1. Easy to configure: only IP addresses of server nodes are > > > > > required. > > > > > 2. Easy to deploy: only 1 additional jar required. No server > > > > > side(Ignite worker) configuration required. > > > > > > > > > > I propose to implement Spark Data Frame integration through Java Thin > > > > > Client. > > > > > > > > > > Thoughts? > > > > > > > > > > [1] https://apacheignite-fs.readme.io/docs/installation-deployment > > > > > [2] > > > > > > > > > > > > > https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-ignite-dataframe-options > > > > > > >
signature.asc
Description: This is a digitally signed message part