Hello, Valentin. > What I don't agree with is that replacing thick client with thin client is a > way to fix usability issues.
I think it will fix some of them. > will potentially compromise the performance As I mentioned earlier, I want to provide easy way to play with integration. For maximum performance one should use client nodes. > What is the difference between thin and thick client from this point of view? We need only 1 jar file. All config options we need is list of ip addressed. > I'm not arguing there are usability issues with thick client. > I'm just suggesting to fix those issues first, before we jump reworking the > implementation. > My suggestion is to look at usability issues and try to fix them without > getting rid of thick client. I agree, let's do it! Can you create some tickets? I'm ready to look at it and contribute a fix. В Вт, 23/10/2018 в 19:31 -0700, Valentin Kulichenko пишет: > Nikolay, > > Please see my comments below. Actually, I haven't made most of the > assumptions that you mentioned, and I generally agree with you. What I > don't agree with is that replacing thick client with thin client is a way > to fix usability issues. Thin client is not going to be issue-free either, > but will potentially compromise the performance, as well as functionality > (like streaming, as Stephen mentioned). My suggestion is to look at > usability issues and try to fix them without getting rid of thick client. > > -Val > > On Sun, Oct 21, 2018 at 1:08 AM Nikolay Izhikov <nizhi...@apache.org> wrote: > > > Valentin. > > > > Seems, You made several suggestions, which is not always true, from my > > point of view: > > > > 1. "We have access to Spark cluster installation to perform deployment > > steps" - this is not true in cloud or enterprise environment. > > > > Can you please elaborate on this? What is the difference between thin and > thick client from this point of view? I understand that the latter would > generally be more complicated, but how would one use thin client without > deploying a JAR? > > > > > > 2. "Spark cluster is used only for Ignite integration". > > From what I know computational resources for big Spark cluster is divided > > by many business divisions. > > And it is not convenient to perform some deployment steps on this cluster. > > > > Same as #1. Regardless how we use the Spark cluster, we need to deploy a > JAR in case of thin client, no? > > > > > > 3. "When Ignite + Spark are used in real production it's OK to have > > reasonable deployment overhead" > > What about developer who want to play with this integration? > > And want to do it quickly to see how it works in real life examples. > > Can we do his life much easier? > > > > We can and we should :) I'm not arguing there are usability issues with > thick client. I'm just suggesting to fix those issues first, before we jump > reworking the implementation. > > > > > > > First of all, they will exist with thin client either. > > > > Spark have an ability to deploy jars on worker and add it to application > > tasks classpath. > > For 2.6 we must deploy 11 additional jars to start using Ignite. > > Please, see my example on the bottom of documentation page [1] > > > > Does cache-api-1.0.0.jar and h2-1.4.195.jar seems like obvious > > dependencies for Ignite integration for you? > > And for our users? :) > > > > No, this is not obvious. Absolutely, this is a usability issue and we > should think how to make user's life easier. > > > > > > Actually, list of dependencies will be changed in 2.7 - new version of > > jcache, new version of h2 > > So user should change it in code or perform additional deployment steps. > > > > It overkill for me. > > > > On the other hand - thin client requires only 1 jar. > > Moreover, thin client protocol have the backward compatibility. > > So thin client will perform correctly when Ignite cluster will be updated > > from 2.6 to 2.7. > > So, with Spark integration via thin client we will be able to update > > Ignite cluster and Spark integration separately. > > For now, we should do it in one big step. > > > > What do you think? > > > > [1] https://apacheignite-fs.readme.io/docs/installation-deployment > > > > В Сб, 20/10/2018 в 18:33 -0700, Valentin Kulichenko пишет: > > > Guys, > > > > > > From my experience, Ignite and Spark clusters typically run in the same > > > environment, which makes client node a more preferable option. Mainly, > > > because of performance. BTW, I doubt partition-awareness on thin client > > > will help either, because in dataframes we only run SQL queries and I > > > believe thin client will execute them through a proxy anyway. But correct > > > me if I’m wrong. > > > > > > Either way, it sounds like we just have usability issues with > > > > Ignite/Spark > > > integration. Why don’t we concentrate on fixing them then? For example, > > > > #3 > > > can be fixed by loading XML content on master and then distributing it to > > > workers, instead of loading on every worker independently. Then there are > > > certain procedures like deploying JARs, etc. First of all, they will > > > > exist > > > with thin client either. Second of all, I’m sure there are ways to > > > > simplify > > > this procedures and make integration easier. My opinion is that working > > > > on > > > such improvements is going to add more value than another implementation > > > based on thin client. > > > > > > -Val > > > > > > On Sat, Oct 20, 2018 at 4:03 PM Denis Magda <dma...@apache.org> wrote: > > > > > > > Hello Nikolay, > > > > > > > > Your proposal sounds reasonable. However, I would suggest us to wait > > > > while > > > > partition-awareness is supported for Java thin client first. With that > > > > feature, the client can connect to any node directly while presently > > > > all > > > > the communication goes through a proxy (a node the client is connected > > > > to). > > > > All of that is bad for performance. > > > > > > > > > > > > Vladimir, how hard would it be to support the partition-awareness for > > > > Java > > > > client? Probably, Nikolay can take over. > > > > > > > > -- > > > > Denis > > > > > > > > > > > > On Sat, Oct 20, 2018 at 2:09 PM Nikolay Izhikov <nizhi...@apache.org> > > > > wrote: > > > > > > > > > Hello, Igniters. > > > > > > > > > > Currently, Spark Data Frame integration implemented via client node > > > > > connection. > > > > > Whenever we need to retrieve some data into Spark worker(or master) > > > > from > > > > > Ignite we start a client node. > > > > > > > > > > It has several major disadvantages: > > > > > > > > > > 1. We should copy whole Ignite distribution on to each Spark > > > > > worker [1] > > > > > 2. We should copy whole Ignite distribution on to Spark > > > > master to > > > > > get catalogue works. > > > > > 3. We should have the same absolute path to Ignite > > > > configuration > > > > > file on every worker and provide it during data frame construction > > > > [2] > > > > > 4. We should additionally configure Spark workerks classpath > > > > to > > > > > include Ignite libraries. > > > > > > > > > > For now, almost all operation we need to do in Spark Data Frame > > > > > integration is supported by Java Thin Client. > > > > > * obtain the list of caches. > > > > > * get cache configuration. > > > > > * execute SQL query. > > > > > * stream data to the table - don't support by the thin > > > > client for > > > > > now, but can be implemented using simple SQL INSERT statements. > > > > > > > > > > Advantages of usage Java Thin Client in Spark integration(they all > > > > known > > > > > from Java Thin Client advantages): > > > > > 1. Easy to configure: only IP addresses of server nodes are > > > > > required. > > > > > 2. Easy to deploy: only 1 additional jar required. No server > > > > > side(Ignite worker) configuration required. > > > > > > > > > > I propose to implement Spark Data Frame integration through Java Thin > > > > > Client. > > > > > > > > > > Thoughts? > > > > > > > > > > [1] https://apacheignite-fs.readme.io/docs/installation-deployment > > > > > [2] > > > > > > > > > > > > > > > > > https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-ignite-dataframe-options > > > > >
signature.asc
Description: This is a digitally signed message part