Nikolay, Val, Since we agreed to release the feature without the strategy support, can the current integration meet the world in 2.4 release? Please chime in this conversation: http://apache-ignite-developers.2346864.n4.nabble.com/Time-and-scope-for-Apache-Ignite-2-4-td24987.html
— Denis > On Nov 28, 2017, at 5:42 PM, Valentin Kulichenko > <valentin.kuliche...@gmail.com> wrote: > > Denis, > > Agree. I will do the final review in next few days and merge the code. > > -Val > > On Tue, Nov 28, 2017 at 5:28 PM, Denis Magda <dma...@apache.org> wrote: > >> Guys, >> >> Looking into the parallel discussion about the strategy support I would >> change my initial stance and support the idea of releasing the integration >> in its current state. Is the code ready to be merged into the master? Let’s >> concentrate on this first and handle the strategy support as a separate >> JIRA task. Agree? >> >> — >> Denis >> >>> On Nov 27, 2017, at 3:47 PM, Valentin Kulichenko < >> valentin.kuliche...@gmail.com> wrote: >>> >>> Nikolay, >>> >>> Let's estimate the strategy implementation work, and then decide weather >> to >>> merge the code in current state or not. If anything is unclear, please >>> start a separate discussion. >>> >>> -Val >>> >>> On Fri, Nov 24, 2017 at 5:42 AM, Николай Ижиков <nizhikov....@gmail.com> >>> wrote: >>> >>>> Hello, Val, Denis. >>>> >>>>> Personally, I think that we should release the integration only after >>>> the strategy is fully supported. >>>> >>>> I see two major reason to propose merge of DataFrame API implementation >>>> without custom strategy: >>>> >>>> 1. My PR is relatively huge, already. From my experience of interaction >>>> with Ignite community - the bigger PR becomes, the more time of >> commiters >>>> required to review PR. >>>> So, I propose to move smaller, but complete steps here. >>>> >>>> 2. It is not clear for me what exactly includes "custom strategy and >>>> optimization". >>>> Seems, that additional discussion required. >>>> I think, I can put my thoughts on the paper and start discussion right >>>> after basic implementation is done. >>>> >>>>> Custom strategy implementation is actually very important for this >>>> integration. >>>> >>>> Understand and fully agreed. >>>> I'm ready to continue work in that area. >>>> >>>> 23.11.2017 02:15, Denis Magda пишет: >>>> >>>> Val, Nikolay, >>>>> >>>>> Personally, I think that we should release the integration only after >> the >>>>> strategy is fully supported. Without the strategy we don’t really >> leverage >>>>> from Ignite’s SQL engine and introduce redundant data movement between >>>>> Ignite and Spark nodes. >>>>> >>>>> How big is the effort to support the strategy in terms of the amount of >>>>> work left? 40%, 60%, 80%? >>>>> >>>>> — >>>>> Denis >>>>> >>>>> On Nov 22, 2017, at 2:57 PM, Valentin Kulichenko < >>>>>> valentin.kuliche...@gmail.com> wrote: >>>>>> >>>>>> Nikolay, >>>>>> >>>>>> Custom strategy implementation is actually very important for this >>>>>> integration. Basically, it will allow to create a SQL query for Ignite >>>>>> and >>>>>> execute it directly on the cluster. Your current implementation only >>>>>> adds a >>>>>> new DataSource which means that Spark will fetch data in its own >> memory >>>>>> first, and then do most of the work (like joins for example). Does it >>>>>> make >>>>>> sense to you? Can you please take a look at this and provide your >>>>>> thoughts >>>>>> on how much development is implied there? >>>>>> >>>>>> Current code looks good to me though and I'm OK if the strategy is >>>>>> implemented as a next step in a scope of separate ticket. I will do >> final >>>>>> review early next week and will merge it if everything is OK. >>>>>> >>>>>> -Val >>>>>> >>>>>> On Thu, Oct 19, 2017 at 7:29 AM, Николай Ижиков < >> nizhikov....@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Hello. >>>>>>> >>>>>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two >> Catalog >>>>>>>> >>>>>>> implementations and what is the difference? >>>>>>> >>>>>>> IgniteCatalog removed. >>>>>>> >>>>>>> 5. I don't like that IgniteStrategy and IgniteOptimization have to be >>>>>>>> >>>>>>> set manually on SQLContext each time it's created....Is there any >> way to >>>>>>> automate this and improve usability? >>>>>>> >>>>>>> IgniteStrategy and IgniteOptimization are removed as it empty now. >>>>>>> >>>>>>> Actually, I think it makes sense to create a builder similar to >>>>>>>> >>>>>>> SparkSession.builder()... >>>>>>> >>>>>>> IgniteBuilder added. >>>>>>> Syntax looks like: >>>>>>> >>>>>>> ``` >>>>>>> val igniteSession = IgniteSparkSession.builder() >>>>>>> .appName("Spark Ignite catalog example") >>>>>>> .master("local") >>>>>>> .config("spark.executor.instances", "2") >>>>>>> .igniteConfig(CONFIG) >>>>>>> .getOrCreate() >>>>>>> >>>>>>> igniteSession.catalog.listTables().show() >>>>>>> ``` >>>>>>> >>>>>>> Please, see updated PR - https://github.com/apache/ignite/pull/2742 >>>>>>> >>>>>>> 2017-10-18 20:02 GMT+03:00 Николай Ижиков <nizhikov....@gmail.com>: >>>>>>> >>>>>>> Hello, Valentin. >>>>>>>> >>>>>>>> My answers is below. >>>>>>>> Dmitry, do we need to move discussion to Jira? >>>>>>>> >>>>>>>> 1. Why do we have org.apache.spark.sql.ignite package in our >> codebase? >>>>>>>>> >>>>>>>> >>>>>>>> As I mentioned earlier, to implement and override Spark Catalog one >>>>>>>> have >>>>>>>> to use internal(private) Spark API. >>>>>>>> So I have to use package `org.spark.sql.***` to have access to >> private >>>>>>>> class and variables. >>>>>>>> >>>>>>>> For example, SharedState class that stores link to ExternalCatalog >>>>>>>> declared as `private[sql] class SharedState` - i.e. package private. >>>>>>>> >>>>>>>> Can these classes reside under org.apache.ignite.spark instead? >>>>>>>>> >>>>>>>> >>>>>>>> No, as long as we want to have our own implementation of >>>>>>>> ExternalCatalog. >>>>>>>> >>>>>>>> 2. IgniteRelationProvider contains multiple constants which I guess >> are >>>>>>>>> >>>>>>>> some king of config options. Can you describe the purpose of each of >>>>>>>> them? >>>>>>>> >>>>>>>> I extend comments for this options. >>>>>>>> Please, see my commit [1] or PR HEAD: >>>>>>>> >>>>>>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two >> Catalog >>>>>>>>> >>>>>>>> implementations and what is the difference? >>>>>>>> >>>>>>>> Good catch, thank you! >>>>>>>> After additional research I founded that only IgniteExternalCatalog >>>>>>>> required. >>>>>>>> I will update PR with IgniteCatalog remove in a few days. >>>>>>>> >>>>>>>> 4. IgniteStrategy and IgniteOptimization are currently no-op. What >> are >>>>>>>>> >>>>>>>> our plans on implementing them? Also, what exactly is planned in >>>>>>>> IgniteOptimization and what is its purpose? >>>>>>>> >>>>>>>> Actually, this is very good question :) >>>>>>>> And I need advice from experienced community members here: >>>>>>>> >>>>>>>> `IgniteOptimization` purpose is to modify query plan created by >> Spark. >>>>>>>> Currently, we have one optimization described in IGNITE-3084 [2] by >>>>>>>> you, >>>>>>>> Valentin :) : >>>>>>>> >>>>>>>> “If there are non-Ignite relations in the plan, we should fall back >> to >>>>>>>> native Spark strategies“ >>>>>>>> >>>>>>>> I think we can go little further and reduce join of two Ignite >> backed >>>>>>>> Data Frames into single Ignite SQL query. Currently, this feature is >>>>>>>> unimplemented. >>>>>>>> >>>>>>>> *Do we need it now? Or we can postpone it and concentrates on basic >>>>>>>> Data >>>>>>>> Frame and Catalog implementation?* >>>>>>>> >>>>>>>> `Strategy` purpose, as you correctly mentioned in [2], is transform >>>>>>>> LogicalPlan into physical operators. >>>>>>>> I don’t have ideas how to use this opportunity. So I think we don’t >>>>>>>> need >>>>>>>> IgniteStrategy. >>>>>>>> >>>>>>>> Can you or anyone else suggest some optimization strategy to speed >> up >>>>>>>> SQL >>>>>>>> query execution? >>>>>>>> >>>>>>>> 5. I don't like that IgniteStrategy and IgniteOptimization have to >> be >>>>>>>>> >>>>>>>> set manually on SQLContext each time it's created....Is there any >> way >>>>>>>> to >>>>>>>> automate this and improve usability? >>>>>>>> >>>>>>>> These classes added to `extraOptimizations` when one using >>>>>>>> IgniteSparkSession. >>>>>>>> As far as I know, there is no way to automatically add these >> classes to >>>>>>>> regular SparkSession. >>>>>>>> >>>>>>>> 6. What is the purpose of IgniteSparkSession? I see it's used in >>>>>>>>> >>>>>>>> IgniteCatalogExample but not in IgniteDataFrameExample, which is >>>>>>>> Confusing. >>>>>>>> >>>>>>>> DataFrame API is *public* Spark API. So anyone can provide >>>>>>>> implementation >>>>>>>> and plug it into Spark. That’s why IgniteDataFrameExample doesn’t >> need >>>>>>>> any >>>>>>>> Ignite specific session. >>>>>>>> >>>>>>>> Catalog API is *internal* Spark API. There is no way to plug custom >>>>>>>> catalog implementation into Spark [3]. So we have to use >>>>>>>> `IgniteSparkSession` that extends regular SparkSession and overrides >>>>>>>> links >>>>>>>> to `ExternalCatalog`. >>>>>>>> >>>>>>>> 7. To create IgniteSparkSession we first create IgniteContext. Is it >>>>>>>>> >>>>>>>> really needed? It looks like we can directly provide the >> configuration >>>>>>>> file; if IgniteSparkSession really requires IgniteContext, it can >>>>>>>> create it >>>>>>>> by itself under the hood. >>>>>>>> >>>>>>>> Actually, IgniteContext is base class for Ignite <-> Spark >> integration >>>>>>>> for now. So I tried to reuse it here. I like the idea to remove >>>>>>>> explicit >>>>>>>> usage of IgniteContext. >>>>>>>> Will implement it in a few days. >>>>>>>> >>>>>>>> Actually, I think it makes sense to create a builder similar to >>>>>>>>> >>>>>>>> SparkSession.builder()... >>>>>>>> >>>>>>>> Great idea! I will implement such builder in a few days. >>>>>>>> >>>>>>>> 9. Do I understand correctly that IgniteCacheRelation is for the >> case >>>>>>>>> >>>>>>>> when we don't have SQL configured on Ignite side? >>>>>>>> >>>>>>>> Yes, IgniteCacheRelation is Data Frame implementation for a >> key-value >>>>>>>> cache. >>>>>>>> >>>>>>>> I thought we decided not to support this, no? Or this is something >>>>>>>>> else? >>>>>>>>> >>>>>>>> >>>>>>>> My understanding is following: >>>>>>>> >>>>>>>> 1. We can’t support automatic resolving key-value caches in >>>>>>>> *ExternalCatalog*. Because there is no way to reliably detect key >> and >>>>>>>> value >>>>>>>> classes. >>>>>>>> >>>>>>>> 2. We can support key-value caches in regular Data Frame >>>>>>>> implementation. >>>>>>>> Because we can require user to provide key and value classes >>>>>>>> explicitly. >>>>>>>> >>>>>>>> 8. Can you clarify the query syntax in >> IgniteDataFrameExample#nativeS >>>>>>>>> >>>>>>>> parkSqlFromCacheExample2? >>>>>>>> >>>>>>>> Key-value cache: >>>>>>>> >>>>>>>> key - java.lang.Long, >>>>>>>> value - case class Person(name: String, birthDate: java.util.Date) >>>>>>>> >>>>>>>> Schema of data frame for cache is: >>>>>>>> >>>>>>>> key - long >>>>>>>> value.name - string >>>>>>>> value.birthDate - date >>>>>>>> >>>>>>>> So we can select data from data from cache: >>>>>>>> >>>>>>>> SELECT >>>>>>>> key, `value.name`, `value.birthDate` >>>>>>>> FROM >>>>>>>> testCache >>>>>>>> WHERE key >= 2 AND `value.name` like '%0' >>>>>>>> >>>>>>>> [1] https://github.com/apache/ignite/pull/2742/commits/faf3ed6fe >>>>>>>> bf417bc59b0519156fd4d09114c8da7 >>>>>>>> [2] https://issues.apache.org/jira/browse/IGNITE-3084?focusedCom >>>>>>>> mentId=15794210&page=com.atlassian.jira.plugin.system.issuet >>>>>>>> abpanels:comment-tabpanel#comment-15794210 >>>>>>>> [3] https://issues.apache.org/jira/browse/SPARK-17767?focusedCom >>>>>>>> mentId=15543733&page=com.atlassian.jira.plugin.system.issuet >>>>>>>> abpanels:comment-tabpanel#comment-15543733 >>>>>>>> >>>>>>>> >>>>>>>> 18.10.2017 04:39, Dmitriy Setrakyan пишет: >>>>>>>> >>>>>>>> Val, thanks for the review. Can I ask you to add the same comments >> to >>>>>>>> the >>>>>>>> >>>>>>>>> ticket? >>>>>>>>> >>>>>>>>> On Tue, Oct 17, 2017 at 3:20 PM, Valentin Kulichenko < >>>>>>>>> valentin.kuliche...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Nikolay, Anton, >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I did a high level review of the code. First of all, impressive >>>>>>>>>> results! >>>>>>>>>> However, I have some questions/comments. >>>>>>>>>> >>>>>>>>>> 1. Why do we have org.apache.spark.sql.ignite package in our >>>>>>>>>> codebase? >>>>>>>>>> Can >>>>>>>>>> these classes reside under org.apache.ignite.spark instead? >>>>>>>>>> 2. IgniteRelationProvider contains multiple constants which I >> guess >>>>>>>>>> are >>>>>>>>>> some king of config options. Can you describe the purpose of each >> of >>>>>>>>>> them? >>>>>>>>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two >>>>>>>>>> Catalog >>>>>>>>>> implementations and what is the difference? >>>>>>>>>> 4. IgniteStrategy and IgniteOptimization are currently no-op. What >>>>>>>>>> are >>>>>>>>>> our >>>>>>>>>> plans on implementing them? Also, what exactly is planned in >>>>>>>>>> IgniteOptimization and what is its purpose? >>>>>>>>>> 5. I don't like that IgniteStrategy and IgniteOptimization have >> to be >>>>>>>>>> set >>>>>>>>>> manually on SQLContext each time it's created. This seems to be >> very >>>>>>>>>> error >>>>>>>>>> prone. Is there any way to automate this and improve usability? >>>>>>>>>> 6. What is the purpose of IgniteSparkSession? I see it's used >>>>>>>>>> in IgniteCatalogExample but not in IgniteDataFrameExample, which >> is >>>>>>>>>> confusing. >>>>>>>>>> 7. To create IgniteSparkSession we first create IgniteContext. Is >> it >>>>>>>>>> really >>>>>>>>>> needed? It looks like we can directly provide the configuration >>>>>>>>>> file; if >>>>>>>>>> IgniteSparkSession really requires IgniteContext, it can create >> it by >>>>>>>>>> itself under the hood. Actually, I think it makes sense to create >> a >>>>>>>>>> builder >>>>>>>>>> similar to SparkSession.builder(), it would be good if our APIs >> here >>>>>>>>>> are >>>>>>>>>> consistent with Spark APIs. >>>>>>>>>> 8. Can you clarify the query syntax >>>>>>>>>> inIgniteDataFrameExample#nativeSparkSqlFromCacheExample2? >>>>>>>>>> 9. Do I understand correctly that IgniteCacheRelation is for the >> case >>>>>>>>>> when >>>>>>>>>> we don't have SQL configured on Ignite side? I thought we decided >>>>>>>>>> not to >>>>>>>>>> support this, no? Or this is something else? >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> -Val >>>>>>>>>> >>>>>>>>>> On Tue, Oct 17, 2017 at 4:40 AM, Anton Vinogradov < >>>>>>>>>> avinogra...@gridgain.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Sounds awesome. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'll try to review API & tests this week. >>>>>>>>>>> >>>>>>>>>>> Val, >>>>>>>>>>> Your review still required :) >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 17, 2017 at 2:36 PM, Николай Ижиков < >>>>>>>>>>> nizhikov....@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Yes >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 17 окт. 2017 г. 2:34 PM пользователь "Anton Vinogradov" < >>>>>>>>>>>> avinogra...@gridgain.com> написал: >>>>>>>>>>>> >>>>>>>>>>>> Nikolay, >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> So, it will be able to start regular spark and ignite clusters >>>>>>>>>>>>> and, >>>>>>>>>>>>> >>>>>>>>>>>>> using >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> peer classloading via spark-context, perform any DataFrame >> request, >>>>>>>>>>>> >>>>>>>>>>>>> correct? >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Oct 17, 2017 at 2:25 PM, Николай Ижиков < >>>>>>>>>>>>> >>>>>>>>>>>>> nizhikov....@gmail.com> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hello, Anton. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> An example you provide is a path to a master *local* file. >>>>>>>>>>>>>> These libraries are added to the classpath for each remote >> node >>>>>>>>>>>>>> >>>>>>>>>>>>>> running >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> submitted job. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Please, see documentation: >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/ >>>>>>>>>>>>>> spark/SparkContext.html#addJar(java.lang.String) >>>>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/ >>>>>>>>>>>>>> spark/SparkContext.html#addFile(java.lang.String) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2017-10-17 13:10 GMT+03:00 Anton Vinogradov < >>>>>>>>>>>>>> >>>>>>>>>>>>>> avinogra...@gridgain.com >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> : >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Nikolay, >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> With Data Frame API implementation there are no requirements >> to >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> any >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ignite files on spark worker nodes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What do you mean? I see code like: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> spark.sparkContext.addJar(MAVEN_HOME + >>>>>>>>>>>>>>> "/org/apache/ignite/ignite-core/2.3.0-SNAPSHOT/ignite- >>>>>>>>>>>>>>> core-2.3.0-SNAPSHOT.jar") >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Oct 16, 2017 at 5:22 PM, Николай Ижиков < >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> nizhikov....@gmail.com> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello, guys. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have created example application to run Ignite Data Frame >> on >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> standalone >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Spark cluster. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> With Data Frame API implementation there are no >> requirements to >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> any >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ignite files on spark worker nodes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I ran this application on the free dataset: ATP tennis match >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> statistics. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> data - https://github.com/nizhikov/atp_matches >>>>>>>>>>>>>>>> app - https://github.com/nizhikov/ignite-spark-df-example >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Valentin, do you have a chance to look at my changes? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2017-10-12 6:03 GMT+03:00 Valentin Kulichenko < >>>>>>>>>>>>>>>> valentin.kuliche...@gmail.com >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Nikolay, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sorry for delay on this, got a little swamped lately. I >> will >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>> my >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> best >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> to >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> review the code this week. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Val >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Oct 9, 2017 at 11:48 AM, Николай Ижиков < >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> nizhikov....@gmail.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello, Valentin. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Did you have a chance to look at my changes? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Now I think I have done almost all required features. >>>>>>>>>>>>>>>>>> I want to make some performance test to ensure my >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> implementation >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> work >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> properly with a significant amount of data. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> And I definitely need some feedback for my changes. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2017-10-09 18:45 GMT+03:00 Николай Ижиков < >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> : >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello, guys. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Which version of Spark do we want to use? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 1. Currently, Ignite depends on Spark 2.1.0. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> * Can be run on JDK 7. >>>>>>>>>>>>>>>>>>> * Still supported: 2.1.2 will be released soon. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2. Latest Spark version is 2.2.0. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> * Can be run only on JDK 8+ >>>>>>>>>>>>>>>>>>> * Released Jul 11, 2017. >>>>>>>>>>>>>>>>>>> * Already supported by huge vendors(Amazon for >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> example). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Note that in IGNITE-3084 I implement some internal Spark >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> API. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> So It will take some effort to switch between Spark 2.1 and >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2.2 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2017-09-27 2:20 GMT+03:00 Valentin Kulichenko < >>>>>>>>>>>>>>>>>>> valentin.kuliche...@gmail.com>: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I will review in the next few days. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -Val >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Tue, Sep 26, 2017 at 2:23 PM, Denis Magda < >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> dma...@apache.org >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello Nikolay, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> This is good news. Finally this capability is coming to >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Ignite. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Val, Vladimir, could you do a preliminary review? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Answering on your questions. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1. Yardstick should be enough for performance >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> measurements. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> As a >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Spark >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> user, I will be curious to know what’s the point of this >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> integration. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Probably we need to compare Spark + Ignite and Spark + >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hive >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> or >>>>>>>>>>>> >>>>>>>>>>>> Spark + >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> RDBMS cases. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2. If Spark community is reluctant let’s include the >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> module >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>> in >>>>>>>>>>>> >>>>>>>>>>>> ignite-spark integration. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> — >>>>>>>>>>>>>>>>>>>>> Denis >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sep 25, 2017, at 11:14 AM, Николай Ижиков < >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> nizhikov....@gmail.com> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hello, guys. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Currently, I’m working on integration between Spark >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Ignite >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> [1]. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For now, I implement following: >>>>>>>>>>>>>>>>>>>>>> * Ignite DataSource implementation( >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> IgniteRelationProvider) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> * DataFrame support for Ignite SQL table. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> * IgniteCatalog implementation for a transparent >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> resolving >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> of >>>>>>>>>>>>>> >>>>>>>>>>>>>> ignites >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> SQL tables. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Implementation of it can be found in PR [2] >>>>>>>>>>>>>>>>>>>>>> It would be great if someone provides feedback for a >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> prototype. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I made some examples in PR so you can see how API >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> suppose >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to >>>>>>>>>>>> >>>>>>>>>>>> be >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> used [3]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [4]. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I need some advice. Can you help me? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 1. How should this PR be tested? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Of course, I need to provide some unit tests. But what >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> scalability >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> tests, etc. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Maybe we need some Yardstick benchmark or similar? >>>>>>>>>>>>>>>>>>>>>> What are your thoughts? >>>>>>>>>>>>>>>>>>>>>> Which scenarios should I consider in the first place? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 2. Should we provide Spark Catalog implementation >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Ignite >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> codebase? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> A current implementation of Spark Catalog based on >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> *internal >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Spark >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> API*. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Spark community seems not interested in making Catalog >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> API >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> public >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> including Ignite Catalog in Spark code base [5], [6]. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> *Should we include Spark internal API implementation >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Ignite >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> base?* >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Or should we consider to include Catalog >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> implementation >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>> in >>>>>>>>>>> >>>>>>>>>>> some >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> external >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> module? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> That will be created and released outside Ignite?(we >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> can >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> support >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> develop it inside Ignite community). >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-3084 >>>>>>>>>>>>>>>>>>>>>> [2] https://github.com/apache/ignite/pull/2742 >>>>>>>>>>>>>>>>>>>>>> [3] https://github.com/apache/ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ignite/pull/2742/files#diff- >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> f4ff509cef3018e221394474775e0905 >>>>>>>>>>>>> >>>>>>>>>>>>>> [4] https://github.com/apache/ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ignite/pull/2742/files#diff- >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> f2b670497d81e780dfd5098c5dd8a89c >>>>>>>>>>>>> >>>>>>>>>>>>>> [5] http://apache-spark-developers-list.1001551.n3. >>>>>>>>>>>>>>>>>>>>>> nabble.com/Spark-Core-Custom- >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Catalog-Integration-between- >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Apache-Ignite-and-Apache-Spark-td22452.html >>>>>>>>>>>> >>>>>>>>>>>>> [6] https://issues.apache.org/jira/browse/SPARK-17767 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> Nikolay Izhikov >>>>>>>>>>>>>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> Nikolay Izhikov >>>>>>>>>>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Nikolay Izhikov >>>>>>>>>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Nikolay Izhikov >>>>>>>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Nikolay Izhikov >>>>>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nikolay Izhikov >>>>>>> nizhikov....@gmail.com >>>>>>> >>>>>>> >>>>> >> >>