[
https://issues.apache.org/jira/browse/IGNITE-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Zinoviev updated IGNITE-9108:
------------------------------------
Fix Version/s: 2.9
> Spark DataFrames With Cache Key and Value Objects
> -------------------------------------------------
>
> Key: IGNITE-9108
> URL: https://issues.apache.org/jira/browse/IGNITE-9108
> Project: Ignite
> Issue Type: New Feature
> Components: spark
> Affects Versions: 2.9
> Reporter: Stuart Macdonald
> Assignee: Alexey Zinoviev
> Priority: Major
> Fix For: 2.9
>
>
> Add support for _key and _val columns within Ignite-provided Spark
> DataFrames, which represent the cache key and value objects similar to the
> current _key/_val column semantics in Ignite SQL.
>
> If the cache key or value objects are standard SQL types (eg. String, Int,
> etc) they will be represented as such in the DataFrame schema, otherwise they
> are represented as Binary types encoded as either: 1. Ignite BinaryObjects,
> in which case we'd need to supply a Spark Encoder implementation for
> BinaryObjects, eg:
>
> {code:java}
> IgniteSparkSession session = ...
> Dataset<Row> dataFrame = ...
> Dataset<MyValClass> valDataSet =
> dataFrame.select("_val_).as(session.binaryObjectEncoder(MyValClass.class))
> {code}
> Or 2. Kryo-serialised versions of the objects, eg:
>
> {code:java}
> Dataset<Row> dataFrame = ...
> DataSet<MyValClass> dataSet =
> dataFrame.select("_val_).as(Encoders.kryo(MyValClass.class))
> {code}
> Option 1 would probably be more efficient but option 2 would be more
> idiomatic Spark.
>
> The rationale behind this is the same as the Ignite SQL _key and _val
> columns: to allow access to the full cache objects from a SQL context.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)