[jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite

Valentin Kulichenko (JIRA) Tue, 12 Dec 2017 16:28:52 -0800

    [ 
https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288527#comment-16288527
 ]


Valentin Kulichenko commented on IGNITE-3084:
---------------------------------------------

[~NIzhikov], I'm doing the review and have several questions/comments (probably 
more to come, but wanted to keep you updated).
# {{IgniteRelationProvider}} looks a bit overcomplicated. I think we should 
leave only {{IGNITE}}, {{CONFIG_FILE}}, {{TABLE}} options and also add 
{{CONFIG}} to allow providing {{IgniteConfiguration}} object. {{GRID}}, 
{{TCP_IP_ADDRESSES}} and {{PEER_CLASS_LOADING}} should be removed. Others are 
related to non-SQL use case which I'm not sure about, let's discuss it 
separately.
# {{IgniteRelationProvider}} always created {{IgniteContext}} in embedded mode 
({{standalone=false}}). Let's set this flag to {{true}} instead, embedded mode 
will be deprecated in 2.4 anyway.
# What is the purpose of implementing {{onApplicationEnd}}? When exactly is it 
invoked? If it's needed, should it be on {{IgniteContext}} level instead?
# {{IgniteSQLRelation}} lines 74-76 are not used, can they be removed?
# {{IgniteSQLRelation#sqlCacheName}} looks incorrect as it assumes that cache 
is created via DDL. This is not always the case.
# What is the purpose of {{IgniteSQLRelation#calcPartitions}} method? Can you 
please explain what it does and how it works?
# Why {{*Relation}}, {{IgniteRelationProvider}} and {{package.scala}} are in 
public package? It seems that user never accesses them directly; if that's the 
case, let's move them to {{impl}} package.
# I'm not sure I understand the value of custom catalog implementation. Can you 
please elaborate what exactly it provides to a user?
# {{IgniteCacheRelation}} is questionable. Main problem is that it works with 
classes which are not always available. Also what if schema is dynamic, how are 
we going to support it? I think it's better to support data frames only via 
Ignite SQL, unless we come up with a cleaner solution. Let me know what you 
think.

> Spark Data Frames Support in Apache Ignite
> ------------------------------------------
>
>                 Key: IGNITE-3084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3084
>             Project: Ignite
>          Issue Type: Task
>          Components: spark
>    Affects Versions: 1.5.0.final
>            Reporter: Vladimir Ozerov
>            Assignee: Nikolay Izhikov
>            Priority: Critical
>              Labels: bigdata, important
>             Fix For: 2.4
>
>
> Apache Spark already benefits from integration with Apache Ignite. The latter 
> provides shared RDDs, an implementation of Spark RDD, that help Spark to 
> share a state between Spark workers and execute SQL queries much faster. The 
> next logical step is to enable support for modern Spark Data Frames API in a 
> similar way.
> As a contributor, you will be fully in charge of the integration of Spark 
> Data Frame API and Apache Ignite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite

Reply via email to