[jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite

Nikolay Izhikov (JIRA) Tue, 26 Dec 2017 23:30:35 -0800

    [ 
https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304288#comment-16304288
 ]


Nikolay Izhikov commented on IGNITE-3084:
-----------------------------------------

{quote}onApplicationEnd method - makes sense. But it sounds like it should be 
on IgniteContext level, what do you think?{quote}

OK. Listener moved to IgniteContext.
I think of IgniteContext as part of public API so I don't want to change its 
behavior. That’s the reason I don’t implement listener inside IgniteContext in 
previous revisions.

{quote}IgniteCacheRelation - let's remove it for now and discuss on dev@ as a 
separate task.{quote}

Done.

{quote}Also let's rename IgniteDataFrameOptions to IgniteDataFrameSettings, and 
inside it:{quote}

Done.

{quote}Remove GRID option for now. It's a bit confusing in the current 
implementation and I'm not sure how to make it more usable. We can always come 
back to this in future if needed.{quote}

Can't do it, because,  {{OPTION_GRID}} used internally by catalog for now.
When spark resolves existing tables for Ignite we need to specify how table is 
stored and properties to access it. Properties is stored in {{Map\[String, 
String\]}}.

Does it make sense for you? Can we replace {{gridName}} with something more 
appropriate?

Please, look inside the code for more details:

IgniteExternalCatalog#getTableOption line 111.

{code:scala}
                    storage = CatalogStorageFormat(
                        locationUri = None,
                        inputFormat = Some(FORMAT_IGNITE),
                        outputFormat = Some(FORMAT_IGNITE),
                        serde = None,
                        compressed = false,
                        properties = Map(
                            OPTION_GRID → gridName,
                            OPTION_TABLE → tableName)
                    ),
{code}

{quote}Can we move {{IgniteSparkSession}} to org.apache.ignite.spark{quote}

No, we can’t. Because many of methods used inside {{IgniteSparkSesion}} are 
package private for a {{org.apache.spark.sql}}. For example:
* SQLContext constructor \[1\]: IgniteSparkSession#63
* SharedState class \[2\]: IgniteSparkSession#66
* Dataset object \[3\]: IgniteSparkSession#103
* Etc…

\[1\] 
https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L58
\[2\] 
https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L42
\[3\] 
https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L59

> Spark Data Frames Support in Apache Ignite
> ------------------------------------------
>
>                 Key: IGNITE-3084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3084
>             Project: Ignite
>          Issue Type: Task
>          Components: spark
>    Affects Versions: 1.5.0.final
>            Reporter: Vladimir Ozerov
>            Assignee: Nikolay Izhikov
>            Priority: Critical
>              Labels: bigdata, important
>             Fix For: 2.4
>
>
> Apache Spark already benefits from integration with Apache Ignite. The latter 
> provides shared RDDs, an implementation of Spark RDD, that help Spark to 
> share a state between Spark workers and execute SQL queries much faster. The 
> next logical step is to enable support for modern Spark Data Frames API in a 
> similar way.
> As a contributor, you will be fully in charge of the integration of Spark 
> Data Frame API and Apache Ignite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite

Reply via email to