Nikolay, Can you please create a separate ticket for the strategy implementation then? Any idea on how long will it take?
As for querying a partition, both SqlQuery and SqlFieldQuery allow to specify set of partitions to work with (see setPartitions method). I think that should be enough. -Val On Wed, Nov 29, 2017 at 3:39 AM, Vladimir Ozerov <[email protected]> wrote: > Hi Nikolay, > > No, it is not possible to get this info from public API, neither we planned > to expose it. See IGNITE-4509 and commit *fbf0e353* to get better > understanding on how this was implemented. > > Vladimir. > > On Wed, Nov 29, 2017 at 2:01 PM, Николай Ижиков <[email protected]> > wrote: > > > Hello, Vladimir. > > > > > partition pruning is already implemented in Ignite, so there is no need > > to do this on your own. > > > > Spark work with partitioned data set. > > It is required to provide data partition information to Spark from custom > > Data Source(Ignite). > > > > Can I get information about pruned partitions throw some public API? > > Is there a plan or ticket to implement such API? > > > > > > > > 2017-11-29 10:34 GMT+03:00 Vladimir Ozerov <[email protected]>: > > > > > Nikolay, > > > > > > Regarding p3. - partition pruning is already implemented in Ignite, so > > > there is no need to do this on your own. > > > > > > On Wed, Nov 29, 2017 at 3:23 AM, Valentin Kulichenko < > > > [email protected]> wrote: > > > > > > > Nikolay, > > > > > > > > Custom strategy allows to fully process the AST generated by Spark > and > > > > convert it to Ignite SQL, so there will be no execution on Spark side > > at > > > > all. This is what we are trying to achieve here. Basically, one will > be > > > > able to use DataFrame API to execute queries directly on Ignite. Does > > it > > > > make sense to you? > > > > > > > > I would recommend you to take a look at MemSQL implementation which > > does > > > > similar stuff: https://github.com/memsql/memsql-spark-connector > > > > > > > > Note that this approach will work only if all relations included in > AST > > > are > > > > Ignite tables. Otherwise, strategy should return null so that Spark > > falls > > > > back to its regular mode. Ignite will be used as regular data source > in > > > > this case, and probably it's possible to implement some optimizations > > > here > > > > as well. However, I never investigated this and it seems like another > > > > separate discussion. > > > > > > > > -Val > > > > > > > > On Tue, Nov 28, 2017 at 9:54 AM, Николай Ижиков < > > [email protected]> > > > > wrote: > > > > > > > > > Hello, guys. > > > > > > > > > > I have implemented basic support of Spark Data Frame API [1], [2] > for > > > > > Ignite. > > > > > Spark provides API for a custom strategy to optimize queries from > > spark > > > > to > > > > > underlying data source(Ignite). > > > > > > > > > > The goal of optimization(obvious, just to be on the same page): > > > > > Minimize data transfer between Spark and Ignite. > > > > > Speedup query execution. > > > > > > > > > > I see 3 ways to optimize queries: > > > > > > > > > > 1. *Join Reduce* If one make some query that join two or > more > > > > > Ignite tables, we have to pass all join info to Ignite and transfer > > to > > > > > Spark only result of table join. > > > > > To implement it we have to extend current implementation > with > > > new > > > > > RelationProvider that can generate all kind of joins for two or > more > > > > tables. > > > > > We should add some tests, also. > > > > > The question is - how join result should be partitioned? > > > > > > > > > > > > > > > 2. *Order by* If one make some query to Ignite table with > > order > > > > by > > > > > clause we can execute sorting on Ignite side. > > > > > But it seems that currently Spark doesn’t have any way to > > tell > > > > > that partitions already sorted. > > > > > > > > > > > > > > > 3. *Key filter* If one make query with `WHERE key = XXX` or > > > > `WHERE > > > > > key IN (X, Y, Z)`, we can reduce number of partitions. > > > > > And query only partitions that store certain key values. > > > > > Is this kind of optimization already built in Ignite or I > > > should > > > > > implement it by myself? > > > > > > > > > > May be, there is any other way to make queries run faster? > > > > > > > > > > [1] https://spark.apache.org/docs/latest/sql-programming-guide. > html > > > > > [2] https://github.com/apache/ignite/pull/2742 > > > > > > > > > > > > > > > > > > > > -- > > Nikolay Izhikov > > [email protected] > > >
