Re: [DISCUSSION] Spark Data Frame through Thin Client

Denis Magda Sat, 20 Oct 2018 16:04:02 -0700

Hello Nikolay,

Your proposal sounds reasonable. However, I would suggest us to wait while
partition-awareness is supported for Java thin client first. With that
feature, the client can connect to any node directly while presently all
the communication goes through a proxy (a node the client is connected to).
All of that is bad for performance.



Vladimir, how hard would it be to support the partition-awareness for Java
client? Probably, Nikolay can take over.

--
Denis


On Sat, Oct 20, 2018 at 2:09 PM Nikolay Izhikov <nizhi...@apache.org> wrote:

> Hello, Igniters.
>
> Currently, Spark Data Frame integration implemented via client node
> connection.
> Whenever we need to retrieve some data into Spark worker(or master) from
> Ignite we start a client node.
>
> It has several major disadvantages:
>
>         1. We should copy whole Ignite distribution on to each Spark
> worker [1]
>         2. We should copy whole Ignite distribution on to Spark master to
> get catalogue works.
>         3. We should have the same absolute path to Ignite configuration
> file on every worker and provide it during data frame construction [2]
>         4. We should additionally configure Spark workerks classpath to
> include Ignite libraries.
>
> For now, almost all operation we need to do in Spark Data Frame
> integration is supported by Java Thin Client.
>         * obtain the list of caches.
>         * get cache configuration.
>         * execute SQL query.
>         * stream data to the table - don't support by the thin client for
> now, but can be implemented using simple SQL INSERT statements.
>
> Advantages of usage Java Thin Client in Spark integration(they all known
> from Java Thin Client advantages):
>         1. Easy to configure: only IP addresses of server nodes are
> required.
>         2. Easy to deploy: only 1 additional jar required. No server
> side(Ignite worker) configuration required.
>
> I propose to implement Spark Data Frame integration through Java Thin
> Client.
>
> Thoughts?
>
> [1] https://apacheignite-fs.readme.io/docs/installation-deployment
> [2]
> https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-ignite-dataframe-options
>

Re: [DISCUSSION] Spark Data Frame through Thin Client

Reply via email to