Spark DataFrames With Cache Key and Value Objects

Stuart Macdonald Fri, 27 Jul 2018 01:38:46 -0700

Ignite Dev Community,

Within Ignite-supplied Spark DataFrames, I’d like to propose adding support
for _key and _val columns which represent the cache key and value objects
similar to the current _key/_val column semantics in Ignite SQL.


If the cache key or value objects are standard SQL types (eg. String, Int,
etc) they will be represented as such in the DataFrame schema, otherwise
they are represented as Binary types encoded as either: 1. Ignite
BinaryObjects, in which case we’d need to supply a Spark Encoder
implementation for BinaryObjects, or 2. Kryo-serialised versions of the
objects. Option 1 would probably be more efficient but option 2 would be
more idiomatic Spark.

This feature would be controlled with an optional parameter in the Ignite
data source, defaulting to the current implementation which doesn’t supply
_key or _val columns. The rationale behind this is the same as the Ignite
SQL _key and _val columns: to allow access to the full cache objects from a
SQL context.

Can I ask for feedback on this proposal please?

I’d be happy to contribute this feature if we agree on the concept.

Stuart.

Spark DataFrames With Cache Key and Value Objects

Reply via email to