[jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite

Nikolay Izhikov (JIRA) Tue, 28 Nov 2017 02:34:07 -0800

    [ 
https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268523#comment-16268523
 ]


Nikolay Izhikov commented on IGNITE-3084:
-----------------------------------------

We can't have IgniteCatalog for 2.1 version of spark.
So I propose to update spark dependencies for module {{spark}} to 2.2.0 in this 
task.

1. To setup IgniteCatalog we need to override `SharedState.externalCatalog` 
val. So spark can lookup Ignite tables.
2. externalCatalog is null while SharedState instance initialized.  
[https://docs.scala-lang.org/tutorials/FAQ/initialization-order.html]
3. externalCatalog is used in internal initializer - 
[SharedState.scala|https://github.com/apache/spark/blob/v2.1.2/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L96]
4. In 2.2.0 version SharedState.scala fixed in the way that allow override of 
externalCatalog - 
[SharedState-2.2.0|https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L93]

{code:scala}
  {
    val defaultDbDefinition = CatalogDatabase(
      SessionCatalog.DEFAULT_DATABASE, "default database", warehousePath, Map())
    if (!externalCatalog.databaseExists(SessionCatalog.DEFAULT_DATABASE)) { // 
<-- Problem is here! externalCatalog == null if we override it.
      externalCatalog.createDatabase(defaultDbDefinition, ignoreIfExists = true)
    }
  }
{code}

> Spark Data Frames Support in Apache Ignite
> ------------------------------------------
>
>                 Key: IGNITE-3084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3084
>             Project: Ignite
>          Issue Type: Task
>          Components: spark
>    Affects Versions: 1.5.0.final
>            Reporter: Vladimir Ozerov
>            Assignee: Nikolay Izhikov
>              Labels: bigdata
>             Fix For: 2.4
>
>
> Apache Spark already benefits from integration with Apache Ignite. The latter 
> provides shared RDDs, an implementation of Spark RDD, that help Spark to 
> share a state between Spark workers and execute SQL queries much faster. The 
> next logical step is to enable support for modern Spark Data Frames API in a 
> similar way.
> As a contributor, you will be fully in charge of the integration of Spark 
> Data Frame API and Apache Ignite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite

Reply via email to