Hello, guys.
I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache Ignite”
and have a proposal to discuss.
I want to provide a consistent way to query Ignite key-value caches from
Spark SQL engine.
To implement it I have to determine java class for the key and value.
It required for calculating schema for a Spark Data Frame.
As far as I know, there is no meta information for key-value cache in
Ignite for now.
If a regular data source is used, a user can provide key class and value
class throw options. Example:
```
val df = spark.read
.format(IGNITE)
.option("config", CONFIG)
.option("cache", CACHE_NAME)
.option("keyClass", "java.lang.Long")
.option("valueClass", "java.lang.String")
.load()
df.printSchema()
df.createOrReplaceTempView("testCache")
val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >= 2
AND value like '%0'")
```
But If we use Ignite implementation of Spark catalog we don’t want to
register existing caches by hand.
Anton Vinogradov proposes syntax that I personally like very much:
*Let’s use following table name for a key-value cache -
`cacheName[keyClass,valueClass]`*
Example:
```
val df3 = igniteSession.sql("SELECT * FROM
`testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
df3.printSchema()
df3.show()
```
Thoughts?
[1] https://issues.apache.org/jira/browse/IGNITE-3084
--
Nikolay Izhikov
[email protected]