Spark+Ignite SQL syntax proposal

Николай Ижиков Thu, 05 Oct 2017 07:05:36 -0700

Hello, guys.

I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache Ignite”
and have a proposal to discuss.


I want to provide a consistent way to query Ignite key-value caches from
Spark SQL engine.

To implement it I have to determine java class for the key and value.
It required for calculating schema for a Spark Data Frame.
As far as I know, there is no meta information for key-value cache in
Ignite for now.

If a regular data source is used, a user can provide key class and value
class throw options. Example:

```
val df = spark.read
  .format(IGNITE)
  .option("config", CONFIG)
  .option("cache", CACHE_NAME)
  .option("keyClass", "java.lang.Long")
  .option("valueClass", "java.lang.String")
  .load()

df.printSchema()

df.createOrReplaceTempView("testCache")

val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >= 2
AND value like '%0'")
```

But If we use Ignite implementation of Spark catalog we don’t want to
register existing caches by hand.
Anton Vinogradov proposes syntax that I personally like very much:

*Let’s use following table name for a key-value cache -
`cacheName[keyClass,valueClass]`*

Example:

```
val df3 = igniteSession.sql("SELECT * FROM
`testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")

df3.printSchema()

df3.show()
```

Thoughts?

[1] https://issues.apache.org/jira/browse/IGNITE-3084

--
Nikolay Izhikov
[email protected]

Spark+Ignite SQL syntax proposal

Reply via email to