Agree. I do not see any reasons to have two different products. Instead, just add ignite-hadoop.jar to distribution, and add separate script to start Accelerator. We can go the same way as we did for "platforms": create separate top-level folder "hadoop" in Fabric distribution and put all realted Hadoop Acceleratro stuff there.
On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > In general, I don't quite understand why we should move any component > outside of Fabric. The concept of Fabric is to have everything, no? :) In > other words, if a cluster was once setup for Hadoop Acceleration, why not > allow to create a cache and/or run a task using native Ignite APIs sometime > later. We follow this approach with all our components and modules, but not > with ignite-hadoop for some reason. > > If we get rid of Hadoop Accelerator build, initial setup of Hadoop > integration can potentially become a bit more complicated, but with proper > documentation I don't think this is going to be a problem, because it > requires multiple steps now anyway. And frankly the same can be said about > any optional module we have - enabling it requires some additional steps as > it doesn't work out of the box. > > -Val > > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dma...@apache.org> wrote: > >> Dmitriy, >> >> > - the "lib/" folder has much fewer libraries that in fabric, simply >> > becomes many dependencies don't make sense for hadoop environment >> >> This reason why the discussion moved to this direction is exactly in that. >> >> How do we decide what should be a part of Hadoop Accelerator and what >> should be excluded? If you read through Val and Cos comments below you’ll >> get more insights. >> >> In general, we need to have a clear understanding on what's Hadoop >> Accelerator distribution use case. This will help us to come up with a >> final decision. >> >> If the accelerator is supposed to be plugged-in into an existed Hadoop >> environment by enabling MapReduce and/IGFS at the configuration level then >> we should simply remove ignite-indexing, ignite-spark modules and add >> additional logging libs as well as AWS, GCE integrations’ packages. >> >> But, wait, what if a user wants to leverage from Ignite Spark >> Integration, Ignite SQL or Geospatial queries, Ignite streaming >> capabilities after he has already plugged-in the accelerator. What if he is >> ready to modify his existed code. He can’t simply switch to the fabric on >> an application side because the fabric doesn’t include accelerator’s libs >> that are still needed. He can’t solely rely on the accelerator distribution >> as well which misses some libs. And, obviously, the user starts shuffling >> libs in between the fabric and accelerator to get what is required. >> >> Vladimir, can you share your thoughts on this? >> >> — >> Denis >> >> >> >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <dsetrak...@apache.org> >> wrote: >> > >> > Guys, >> > >> > I just downloaded the hadoop accelerator and here are the differences >> from >> > the fabric edition that jump at me right away: >> > >> > - the "bin/" folder has "setup-hadoop" scripts >> > - the "config/" folder has "hadoop" subfolder with necessary >> > hadoop-related configuration >> > - the "lib/" folder has much fewer libraries that in fabric, simply >> > becomes many dependencies don't make sense for hadoop environment >> > >> > I currently don't see how we can merge the hadoop accelerator with >> standard >> > fabric edition. >> > >> > D. >> > >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dma...@apache.org> wrote: >> > >> >> Vovan, >> >> >> >> As one of hadoop maintainers, please share your point of view on this. >> >> >> >> — >> >> Denis >> >> >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <skoz...@gridgain.com> >> >> wrote: >> >>> >> >>> Denis >> >>> >> >>> I agree that at the moment there's no reason to split into fabric and >> >>> hadoop editions. >> >>> >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dma...@apache.org> >> wrote: >> >>> >> >>>> Hadoop Accelerator doesn’t require any additional libraries in >> compare >> >> to >> >>>> those we have in the fabric build. It only lacks some of them as Val >> >>>> mentioned below. >> >>>> >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition and >> simply >> >>>> deliver hadoop jar and its configs as a part of the fabric? >> >>>> >> >>>> — >> >>>> Denis >> >>>> >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan < >> dsetrak...@apache.org> >> >>>> wrote: >> >>>>> >> >>>>> Separate edition for the Hadoop Accelerator was primarily driven by >> the >> >>>>> default libraries. Hadoop Accelerator requires many more libraries >> as >> >>>> well >> >>>>> as configuration settings compared to the standard fabric download. >> >>>>> >> >>>>> Now, as far as spark integration is concerned, I am not sure which >> >>>> edition >> >>>>> it belongs in, Hadoop Accelerator or standard fabric. >> >>>>> >> >>>>> D. >> >>>>> >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dma...@apache.org> >> >> wrote: >> >>>>> >> >>>>>> *Dmitriy*, >> >>>>>> >> >>>>>> I do believe that you should know why the community decided to a >> >>>> separate >> >>>>>> edition for the Hadoop Accelerator. What was the reason for that? >> >>>>>> Presently, as I see, it brings more confusion and difficulties >> rather >> >>>> then >> >>>>>> benefit. >> >>>>>> >> >>>>>> — >> >>>>>> Denis >> >>>>>> >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <c...@apache.org> >> >> wrote: >> >>>>>> >> >>>>>> In fact I am very much agree with you. Right now, running the >> >>>> "accelerator" >> >>>>>> component in Bigtop disto gives one a pretty much complete fabric >> >>>> anyway. >> >>>>>> But >> >>>>>> in order to make just an accelerator component we perform quite a >> bit >> >> of >> >>>>>> woodoo magic during the packaging stage of the Bigtop build, >> shuffling >> >>>> jars >> >>>>>> from here and there. And that's quite crazy, honestly ;) >> >>>>>> >> >>>>>> Cos >> >>>>>> >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote: >> >>>>>> >> >>>>>> I tend to agree with Denis. I see only these differences between >> >> Hadoop >> >>>>>> Accelerator and Fabric builds (correct me if I miss something): >> >>>>>> >> >>>>>> - Limited set of available modules and no optional modules in >> Hadoop >> >>>>>> Accelerator. >> >>>>>> - No ignite-hadoop module in Fabric. >> >>>>>> - Additional scripts, configs and instructions included in Hadoop >> >>>>>> Accelerator. >> >>>>>> >> >>>>>> And the list of included modules frankly looks very weird. Here are >> >> only >> >>>>>> some of the issues I noticed: >> >>>>>> >> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need >> them >> >>>>>> for Hadoop Acceleration (which I doubt), are they really required >> or >> >>>> can >> >>>>>> be >> >>>>>> optional? >> >>>>>> - We force to use ignite-log4j module without providing other >> logger >> >>>>>> options (e.g., SLF). >> >>>>>> - We don't include ignite-aws module. How to use Hadoop Accelerator >> >>>> with >> >>>>>> S3 discovery? >> >>>>>> - Etc. >> >>>>>> >> >>>>>> It seems to me that if we try to fix all this issue, there will be >> >>>>>> virtually no difference between Fabric and Hadoop Accelerator >> builds >> >>>> except >> >>>>>> couple of scripts and config files. If so, there is no reason to >> have >> >>>> two >> >>>>>> builds. >> >>>>>> >> >>>>>> -Val >> >>>>>> >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dma...@apache.org> >> >> wrote: >> >>>>>> >> >>>>>> On the separate note, in the Bigtop, we start looking into changing >> >> the >> >>>>>> >> >>>>>> way we >> >>>>>> >> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data >> >>>> fabric' >> >>>>>> experience instead of the mere "hadoop-acceleration”. >> >>>>>> >> >>>>>> >> >>>>>> And you still will be using hadoop-accelerator libs of Ignite, >> right? >> >>>>>> >> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop >> >> Accelerator >> >>>> as >> >>>>>> a separate delivery. >> >>>>>> What if we start releasing the accelerator as a part of the >> standard >> >>>>>> fabric binary putting hadoop-accelerator libs under ‘optional’ >> folder? >> >>>>>> >> >>>>>> — >> >>>>>> Denis >> >>>>>> >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <c...@apache.org> >> >>>> wrote: >> >>>>>> >> >>>>>> What Denis said: spark has been added to the Hadoop accelerator as >> a >> >> way >> >>>>>> >> >>>>>> to >> >>>>>> >> >>>>>> boost the performance of more than just MR compute of the Hadoop >> >> stack, >> >>>>>> >> >>>>>> IIRC. >> >>>>>> >> >>>>>> For what it worth, Spark is considered a part of Hadoop at large. >> >>>>>> >> >>>>>> On the separate note, in the Bigtop, we start looking into changing >> >> the >> >>>>>> >> >>>>>> way we >> >>>>>> >> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data >> >>>> fabric' >> >>>>>> experience instead of the mere "hadoop-acceleration". >> >>>>>> >> >>>>>> Cos >> >>>>>> >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote: >> >>>>>> >> >>>>>> Val, >> >>>>>> >> >>>>>> Ignite Hadoop module includes not only the map-reduce accelerator >> but >> >>>>>> >> >>>>>> Ignite >> >>>>>> >> >>>>>> Hadoop File System component as well. The latter can be used in >> >>>>>> >> >>>>>> deployments >> >>>>>> >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark. >> >>>>>> >> >>>>>> Considering this I’m for the second solution proposed by you: put >> both >> >>>>>> >> >>>>>> 2.10 >> >>>>>> >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite >> Hadoop >> >>>>>> Accelerator distribution. >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 < >> >>>>>> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254> >> >>>>>> >> >>>>>> >> >>>>>> BTW, this task may be affected or related to the following ones: >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 < >> >>>>>> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596> >> >>>>>> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822 >> >>>>>> >> >>>>>> — >> >>>>>> Denis >> >>>>>> >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko < >> >>>>>> >> >>>>>> valentin.kuliche...@gmail.com> wrote: >> >>>>>> >> >>>>>> >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by >> >>>>>> >> >>>>>> Hadoop >> >>>>>> >> >>>>>> when running its jobs. ignite-spark module only provides IgniteRDD >> >>>>>> >> >>>>>> which >> >>>>>> >> >>>>>> Hadoop obviously will never use. >> >>>>>> >> >>>>>> Is there another use case for Hadoop Accelerator which I'm missing? >> >>>>>> >> >>>>>> -Val >> >>>>>> >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan < >> >>>>>> >> >>>>>> dsetrak...@apache.org> >> >>>>>> >> >>>>>> wrote: >> >>>>>> >> >>>>>> Why do you think that spark module is not needed in our hadoop >> build? >> >>>>>> >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko < >> >>>>>> valentin.kuliche...@gmail.com> wrote: >> >>>>>> >> >>>>>> Folks, >> >>>>>> >> >>>>>> Is there anyone who understands the purpose of including >> ignite-spark >> >>>>>> module in the Hadoop Accelerator build? I can't figure out a use >> >>>>>> >> >>>>>> case for >> >>>>>> >> >>>>>> which it's needed. >> >>>>>> >> >>>>>> In case we actually need it there, there is an issue then. We >> >>>>>> >> >>>>>> actually >> >>>>>> >> >>>>>> have >> >>>>>> >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build >> >>>>>> >> >>>>>> everything >> >>>>>> >> >>>>>> is >> >>>>>> >> >>>>>> good, we put both in 'optional' folder and user can enable either >> >>>>>> >> >>>>>> one. >> >>>>>> >> >>>>>> But >> >>>>>> >> >>>>>> in Hadoop Accelerator there is only 2.11 which means that the build >> >>>>>> >> >>>>>> doesn't >> >>>>>> >> >>>>>> work with 2.10 out of the box. >> >>>>>> >> >>>>>> We should either remove the module from the build, or fix the >> issue. >> >>>>>> >> >>>>>> -Val >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>> >> >>>> >> >>> >> >>> >> >>> -- >> >>> Sergey Kozlov >> >>> GridGain Systems >> >>> www.gridgain.com >> >> >> >> >> >> > -- Vladimir Ozerov Senior Software Architect GridGain Systems www.gridgain.com *+7 (960) 283 98 40*