[jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite

Nikolay Izhikov (JIRA) Wed, 13 Dec 2017 03:53:51 -0800

    [ 
https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289162#comment-16289162
 ]


Nikolay Izhikov commented on IGNITE-3084:
-----------------------------------------

> add {{CONFIG}} to allow providing {{IgniteConfiguration}} object. 

We can't do it because params are {{Map[String,String]}}, no objects are 
allowed.

> I think we should leave only {{IGNITE}}, {{CONFIG_FILE}}, {{TABLE}} options 
> .... {{GRID}}, {{TCP_IP_ADDRESSES}} and {{PEER_CLASS_LOADING}} should be 
> removed. 

Done.

Please, note, some details about usage of {{CONFIG_FILE}} option.
For now it is required to have a configuration file on *each sparker worker 
node on the same path*(Please, see {{IgniteContext}} and {{cfgF}} closure.)
We can't just load configuration from the master node to worker nodes because 
IgniteConfigration is not serializable.

I think, Require to copy some files to local filesystem of every cluster node 
is a huge disadvantage.
Does it make sense for you?
Have I missed something?

I tried to provide a simpler approach to connect to existing Ignite cluster 
from Spark worker node.
And came up with {{TCP_IP_ADDRESSES}} parameter.

With it we can just add Ignite jars to the task classpath via 
{{sparkContext.addJar}} and use every existing Spark cluster to execute jobs 
with Ignite DataFrames.
Please, see my example:

https://github.com/nizhikov/ignite-spark-df-example/blob/master/src/main/scala/org/apache/ignite/scalar/examples/spark/StandaloneClustersExample.scala

> Spark Data Frames Support in Apache Ignite
> ------------------------------------------
>
>                 Key: IGNITE-3084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3084
>             Project: Ignite
>          Issue Type: Task
>          Components: spark
>    Affects Versions: 1.5.0.final
>            Reporter: Vladimir Ozerov
>            Assignee: Nikolay Izhikov
>            Priority: Critical
>              Labels: bigdata, important
>             Fix For: 2.4
>
>
> Apache Spark already benefits from integration with Apache Ignite. The latter 
> provides shared RDDs, an implementation of Spark RDD, that help Spark to 
> share a state between Spark workers and execute SQL queries much faster. The 
> next logical step is to enable support for modern Spark Data Frames API in a 
> similar way.
> As a contributor, you will be fully in charge of the integration of Spark 
> Data Frame API and Apache Ignite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite

Reply via email to