[
https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290627#comment-16290627
]
Nikolay Izhikov commented on IGNITE-3084:
-----------------------------------------
6.
> What is the purpose of IgniteSQLRelation#calcPartitions method? Can you
> please explain what it does and how it works?
Spark works with partitioned data sources.
I have implemented following algorithm:
1. REPLICATED cache - partition count is 1 because all data exists on each node.
2. PARTITIONED cache:
1 Spark partition == Set of Ignite Partition with the same primary node.
So each Spark partition(if topology doesn't changed) would be read data from
single Ignite node.
Does it make sense for you? May be there is better approach?
> Spark Data Frames Support in Apache Ignite
> ------------------------------------------
>
> Key: IGNITE-3084
> URL: https://issues.apache.org/jira/browse/IGNITE-3084
> Project: Ignite
> Issue Type: Task
> Components: spark
> Affects Versions: 1.5.0.final
> Reporter: Vladimir Ozerov
> Assignee: Nikolay Izhikov
> Priority: Critical
> Labels: bigdata, important
> Fix For: 2.4
>
>
> Apache Spark already benefits from integration with Apache Ignite. The latter
> provides shared RDDs, an implementation of Spark RDD, that help Spark to
> share a state between Spark workers and execute SQL queries much faster. The
> next logical step is to enable support for modern Spark Data Frames API in a
> similar way.
> As a contributor, you will be fully in charge of the integration of Spark
> Data Frame API and Apache Ignite.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)