In general, I don't quite understand why we should move any component outside of Fabric. The concept of Fabric is to have everything, no? :) In other words, if a cluster was once setup for Hadoop Acceleration, why not allow to create a cache and/or run a task using native Ignite APIs sometime later. We follow this approach with all our components and modules, but not with ignite-hadoop for some reason.
If we get rid of Hadoop Accelerator build, initial setup of Hadoop integration can potentially become a bit more complicated, but with proper documentation I don't think this is going to be a problem, because it requires multiple steps now anyway. And frankly the same can be said about any optional module we have - enabling it requires some additional steps as it doesn't work out of the box. -Val On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dma...@apache.org> wrote: > Dmitriy, > > > - the "lib/" folder has much fewer libraries that in fabric, simply > > becomes many dependencies don't make sense for hadoop environment > > This reason why the discussion moved to this direction is exactly in that. > > How do we decide what should be a part of Hadoop Accelerator and what > should be excluded? If you read through Val and Cos comments below you’ll > get more insights. > > In general, we need to have a clear understanding on what's Hadoop > Accelerator distribution use case. This will help us to come up with a > final decision. > > If the accelerator is supposed to be plugged-in into an existed Hadoop > environment by enabling MapReduce and/IGFS at the configuration level then > we should simply remove ignite-indexing, ignite-spark modules and add > additional logging libs as well as AWS, GCE integrations’ packages. > > But, wait, what if a user wants to leverage from Ignite Spark Integration, > Ignite SQL or Geospatial queries, Ignite streaming capabilities after he > has already plugged-in the accelerator. What if he is ready to modify his > existed code. He can’t simply switch to the fabric on an application side > because the fabric doesn’t include accelerator’s libs that are still > needed. He can’t solely rely on the accelerator distribution as well which > misses some libs. And, obviously, the user starts shuffling libs in between > the fabric and accelerator to get what is required. > > Vladimir, can you share your thoughts on this? > > — > Denis > > > > > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <dsetrak...@apache.org> > wrote: > > > > Guys, > > > > I just downloaded the hadoop accelerator and here are the differences > from > > the fabric edition that jump at me right away: > > > > - the "bin/" folder has "setup-hadoop" scripts > > - the "config/" folder has "hadoop" subfolder with necessary > > hadoop-related configuration > > - the "lib/" folder has much fewer libraries that in fabric, simply > > becomes many dependencies don't make sense for hadoop environment > > > > I currently don't see how we can merge the hadoop accelerator with > standard > > fabric edition. > > > > D. > > > > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dma...@apache.org> wrote: > > > >> Vovan, > >> > >> As one of hadoop maintainers, please share your point of view on this. > >> > >> — > >> Denis > >> > >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <skoz...@gridgain.com> > >> wrote: > >>> > >>> Denis > >>> > >>> I agree that at the moment there's no reason to split into fabric and > >>> hadoop editions. > >>> > >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dma...@apache.org> wrote: > >>> > >>>> Hadoop Accelerator doesn’t require any additional libraries in compare > >> to > >>>> those we have in the fabric build. It only lacks some of them as Val > >>>> mentioned below. > >>>> > >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition and > simply > >>>> deliver hadoop jar and its configs as a part of the fabric? > >>>> > >>>> — > >>>> Denis > >>>> > >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan < > dsetrak...@apache.org> > >>>> wrote: > >>>>> > >>>>> Separate edition for the Hadoop Accelerator was primarily driven by > the > >>>>> default libraries. Hadoop Accelerator requires many more libraries as > >>>> well > >>>>> as configuration settings compared to the standard fabric download. > >>>>> > >>>>> Now, as far as spark integration is concerned, I am not sure which > >>>> edition > >>>>> it belongs in, Hadoop Accelerator or standard fabric. > >>>>> > >>>>> D. > >>>>> > >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dma...@apache.org> > >> wrote: > >>>>> > >>>>>> *Dmitriy*, > >>>>>> > >>>>>> I do believe that you should know why the community decided to a > >>>> separate > >>>>>> edition for the Hadoop Accelerator. What was the reason for that? > >>>>>> Presently, as I see, it brings more confusion and difficulties > rather > >>>> then > >>>>>> benefit. > >>>>>> > >>>>>> — > >>>>>> Denis > >>>>>> > >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <c...@apache.org> > >> wrote: > >>>>>> > >>>>>> In fact I am very much agree with you. Right now, running the > >>>> "accelerator" > >>>>>> component in Bigtop disto gives one a pretty much complete fabric > >>>> anyway. > >>>>>> But > >>>>>> in order to make just an accelerator component we perform quite a > bit > >> of > >>>>>> woodoo magic during the packaging stage of the Bigtop build, > shuffling > >>>> jars > >>>>>> from here and there. And that's quite crazy, honestly ;) > >>>>>> > >>>>>> Cos > >>>>>> > >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote: > >>>>>> > >>>>>> I tend to agree with Denis. I see only these differences between > >> Hadoop > >>>>>> Accelerator and Fabric builds (correct me if I miss something): > >>>>>> > >>>>>> - Limited set of available modules and no optional modules in Hadoop > >>>>>> Accelerator. > >>>>>> - No ignite-hadoop module in Fabric. > >>>>>> - Additional scripts, configs and instructions included in Hadoop > >>>>>> Accelerator. > >>>>>> > >>>>>> And the list of included modules frankly looks very weird. Here are > >> only > >>>>>> some of the issues I noticed: > >>>>>> > >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need > them > >>>>>> for Hadoop Acceleration (which I doubt), are they really required or > >>>> can > >>>>>> be > >>>>>> optional? > >>>>>> - We force to use ignite-log4j module without providing other logger > >>>>>> options (e.g., SLF). > >>>>>> - We don't include ignite-aws module. How to use Hadoop Accelerator > >>>> with > >>>>>> S3 discovery? > >>>>>> - Etc. > >>>>>> > >>>>>> It seems to me that if we try to fix all this issue, there will be > >>>>>> virtually no difference between Fabric and Hadoop Accelerator builds > >>>> except > >>>>>> couple of scripts and config files. If so, there is no reason to > have > >>>> two > >>>>>> builds. > >>>>>> > >>>>>> -Val > >>>>>> > >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dma...@apache.org> > >> wrote: > >>>>>> > >>>>>> On the separate note, in the Bigtop, we start looking into changing > >> the > >>>>>> > >>>>>> way we > >>>>>> > >>>>>> deliver Ignite and we'll likely to start offering the whole 'data > >>>> fabric' > >>>>>> experience instead of the mere "hadoop-acceleration”. > >>>>>> > >>>>>> > >>>>>> And you still will be using hadoop-accelerator libs of Ignite, > right? > >>>>>> > >>>>>> I’m thinking of if there is a need to keep releasing Hadoop > >> Accelerator > >>>> as > >>>>>> a separate delivery. > >>>>>> What if we start releasing the accelerator as a part of the standard > >>>>>> fabric binary putting hadoop-accelerator libs under ‘optional’ > folder? > >>>>>> > >>>>>> — > >>>>>> Denis > >>>>>> > >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <c...@apache.org> > >>>> wrote: > >>>>>> > >>>>>> What Denis said: spark has been added to the Hadoop accelerator as a > >> way > >>>>>> > >>>>>> to > >>>>>> > >>>>>> boost the performance of more than just MR compute of the Hadoop > >> stack, > >>>>>> > >>>>>> IIRC. > >>>>>> > >>>>>> For what it worth, Spark is considered a part of Hadoop at large. > >>>>>> > >>>>>> On the separate note, in the Bigtop, we start looking into changing > >> the > >>>>>> > >>>>>> way we > >>>>>> > >>>>>> deliver Ignite and we'll likely to start offering the whole 'data > >>>> fabric' > >>>>>> experience instead of the mere "hadoop-acceleration". > >>>>>> > >>>>>> Cos > >>>>>> > >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote: > >>>>>> > >>>>>> Val, > >>>>>> > >>>>>> Ignite Hadoop module includes not only the map-reduce accelerator > but > >>>>>> > >>>>>> Ignite > >>>>>> > >>>>>> Hadoop File System component as well. The latter can be used in > >>>>>> > >>>>>> deployments > >>>>>> > >>>>>> like HDFS+IGFS+Ignite Spark + Spark. > >>>>>> > >>>>>> Considering this I’m for the second solution proposed by you: put > both > >>>>>> > >>>>>> 2.10 > >>>>>> > >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite > Hadoop > >>>>>> Accelerator distribution. > >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 < > >>>>>> > >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254> > >>>>>> > >>>>>> > >>>>>> BTW, this task may be affected or related to the following ones: > >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 < > >>>>>> > >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596> > >>>>>> > >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822 > >>>>>> > >>>>>> — > >>>>>> Denis > >>>>>> > >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko < > >>>>>> > >>>>>> valentin.kuliche...@gmail.com> wrote: > >>>>>> > >>>>>> > >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by > >>>>>> > >>>>>> Hadoop > >>>>>> > >>>>>> when running its jobs. ignite-spark module only provides IgniteRDD > >>>>>> > >>>>>> which > >>>>>> > >>>>>> Hadoop obviously will never use. > >>>>>> > >>>>>> Is there another use case for Hadoop Accelerator which I'm missing? > >>>>>> > >>>>>> -Val > >>>>>> > >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan < > >>>>>> > >>>>>> dsetrak...@apache.org> > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>> Why do you think that spark module is not needed in our hadoop > build? > >>>>>> > >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko < > >>>>>> valentin.kuliche...@gmail.com> wrote: > >>>>>> > >>>>>> Folks, > >>>>>> > >>>>>> Is there anyone who understands the purpose of including > ignite-spark > >>>>>> module in the Hadoop Accelerator build? I can't figure out a use > >>>>>> > >>>>>> case for > >>>>>> > >>>>>> which it's needed. > >>>>>> > >>>>>> In case we actually need it there, there is an issue then. We > >>>>>> > >>>>>> actually > >>>>>> > >>>>>> have > >>>>>> > >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build > >>>>>> > >>>>>> everything > >>>>>> > >>>>>> is > >>>>>> > >>>>>> good, we put both in 'optional' folder and user can enable either > >>>>>> > >>>>>> one. > >>>>>> > >>>>>> But > >>>>>> > >>>>>> in Hadoop Accelerator there is only 2.11 which means that the build > >>>>>> > >>>>>> doesn't > >>>>>> > >>>>>> work with 2.10 out of the box. > >>>>>> > >>>>>> We should either remove the module from the build, or fix the issue. > >>>>>> > >>>>>> -Val > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >>> > >>> > >>> -- > >>> Sergey Kozlov > >>> GridGain Systems > >>> www.gridgain.com > >> > >> > >