Re: Spark+Ignite SQL syntax proposal

Valentin Kulichenko Thu, 05 Oct 2017 17:42:46 -0700

Nikolay,

I don't think we need this, especially with this kind of syntax which is
very confusing. Main use case for data frames is SQL, so let's concentrate
on it. We should use Ignite's SQL engine capabilities as much as possible.
If we see other use cases down the road, we can always support them.


-Val

On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <[email protected]>
wrote:

> Hello, Valentin.
>
> I implemented the ability to make Spark SQL Queries for both:
>
> 1.  Ignite SQL Table. Internally table described by QueryEntity with meta
> information about data.
> 2.  Key-Value cache - regular Ignite cache without meta information about
> stored data.
>
> In the second case, we have to know which types cache stores.
> So for this case, I propose use syntax I describe
>
>
> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
> [email protected]>:
>
> > Nikolay,
> >
> > I don't understand. Why do we require to provide key and value types in
> > SQL? What is the issue you're trying to solve with this syntax?
> >
> > -Val
> >
> > On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[email protected]>
> > wrote:
> >
> > > Hello, guys.
> > >
> > > I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
> > Ignite”
> > > and have a proposal to discuss.
> > >
> > > I want to provide a consistent way to query Ignite key-value caches
> from
> > > Spark SQL engine.
> > >
> > > To implement it I have to determine java class for the key and value.
> > > It required for calculating schema for a Spark Data Frame.
> > > As far as I know, there is no meta information for key-value cache in
> > > Ignite for now.
> > >
> > > If a regular data source is used, a user can provide key class and
> value
> > > class throw options. Example:
> > >
> > > ```
> > > val df = spark.read
> > >   .format(IGNITE)
> > >   .option("config", CONFIG)
> > >   .option("cache", CACHE_NAME)
> > >   .option("keyClass", "java.lang.Long")
> > >   .option("valueClass", "java.lang.String")
> > >   .load()
> > >
> > > df.printSchema()
> > >
> > > df.createOrReplaceTempView("testCache")
> > >
> > > val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key
> >= 2
> > > AND value like '%0'")
> > > ```
> > >
> > > But If we use Ignite implementation of Spark catalog we don’t want to
> > > register existing caches by hand.
> > > Anton Vinogradov proposes syntax that I personally like very much:
> > >
> > > *Let’s use following table name for a key-value cache -
> > > `cacheName[keyClass,valueClass]`*
> > >
> > > Example:
> > >
> > > ```
> > > val df3 = igniteSession.sql("SELECT * FROM
> > > `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
> > >
> > > df3.printSchema()
> > >
> > > df3.show()
> > > ```
> > >
> > > Thoughts?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-3084
> > >
> > > --
> > > Nikolay Izhikov
> > > [email protected]
> > >
> >
>
>
>
> --
> Nikolay Izhikov
> [email protected]
>

Re: Spark+Ignite SQL syntax proposal

Reply via email to