Hi In general I agree with Vladimir but would suggest more technical details:
Due the need to collect particular CLASS_PATHs for fabric and hadoop editions we can change the logic of processing of libs directory 1. Introduce libs/hadoop and libs/fabric directories. These directories are root directories for specific modules for hadoop and fabric editions respectively 2. Change collecting of directories for CLASS_PATH for ignite.sh: - collect everything for libs except libs/hadoop - collect everything from libs/fabric 3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may make initial setup instead of setup-hadoop.sh) that constructs CLASS_PATH by following way: - collect everything for libs except libs/fabirc - collect everything from libs/hadoop This approach allows us following: - share common modules across both editions (just put in libs) - do not share edition-specific modules (either put in libs/hadoop or in libs/fabric) On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <voze...@gridgain.com> wrote: > Agree. I do not see any reasons to have two different products. Instead, > just add ignite-hadoop.jar to distribution, and add separate script to > start Accelerator. We can go the same way as we did for "platforms": create > separate top-level folder "hadoop" in Fabric distribution and put all > realted Hadoop Acceleratro stuff there. > > On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko < > valentin.kuliche...@gmail.com> wrote: > > > In general, I don't quite understand why we should move any component > > outside of Fabric. The concept of Fabric is to have everything, no? :) In > > other words, if a cluster was once setup for Hadoop Acceleration, why not > > allow to create a cache and/or run a task using native Ignite APIs > sometime > > later. We follow this approach with all our components and modules, but > not > > with ignite-hadoop for some reason. > > > > If we get rid of Hadoop Accelerator build, initial setup of Hadoop > > integration can potentially become a bit more complicated, but with > proper > > documentation I don't think this is going to be a problem, because it > > requires multiple steps now anyway. And frankly the same can be said > about > > any optional module we have - enabling it requires some additional steps > as > > it doesn't work out of the box. > > > > -Val > > > > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dma...@apache.org> wrote: > > > >> Dmitriy, > >> > >> > - the "lib/" folder has much fewer libraries that in fabric, simply > >> > becomes many dependencies don't make sense for hadoop environment > >> > >> This reason why the discussion moved to this direction is exactly in > that. > >> > >> How do we decide what should be a part of Hadoop Accelerator and what > >> should be excluded? If you read through Val and Cos comments below > you’ll > >> get more insights. > >> > >> In general, we need to have a clear understanding on what's Hadoop > >> Accelerator distribution use case. This will help us to come up with a > >> final decision. > >> > >> If the accelerator is supposed to be plugged-in into an existed Hadoop > >> environment by enabling MapReduce and/IGFS at the configuration level > then > >> we should simply remove ignite-indexing, ignite-spark modules and add > >> additional logging libs as well as AWS, GCE integrations’ packages. > >> > >> But, wait, what if a user wants to leverage from Ignite Spark > >> Integration, Ignite SQL or Geospatial queries, Ignite streaming > >> capabilities after he has already plugged-in the accelerator. What if > he is > >> ready to modify his existed code. He can’t simply switch to the fabric > on > >> an application side because the fabric doesn’t include accelerator’s > libs > >> that are still needed. He can’t solely rely on the accelerator > distribution > >> as well which misses some libs. And, obviously, the user starts > shuffling > >> libs in between the fabric and accelerator to get what is required. > >> > >> Vladimir, can you share your thoughts on this? > >> > >> — > >> Denis > >> > >> > >> > >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan < > dsetrak...@apache.org> > >> wrote: > >> > > >> > Guys, > >> > > >> > I just downloaded the hadoop accelerator and here are the differences > >> from > >> > the fabric edition that jump at me right away: > >> > > >> > - the "bin/" folder has "setup-hadoop" scripts > >> > - the "config/" folder has "hadoop" subfolder with necessary > >> > hadoop-related configuration > >> > - the "lib/" folder has much fewer libraries that in fabric, simply > >> > becomes many dependencies don't make sense for hadoop environment > >> > > >> > I currently don't see how we can merge the hadoop accelerator with > >> standard > >> > fabric edition. > >> > > >> > D. > >> > > >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dma...@apache.org> > wrote: > >> > > >> >> Vovan, > >> >> > >> >> As one of hadoop maintainers, please share your point of view on > this. > >> >> > >> >> — > >> >> Denis > >> >> > >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <skoz...@gridgain.com> > >> >> wrote: > >> >>> > >> >>> Denis > >> >>> > >> >>> I agree that at the moment there's no reason to split into fabric > and > >> >>> hadoop editions. > >> >>> > >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dma...@apache.org> > >> wrote: > >> >>> > >> >>>> Hadoop Accelerator doesn’t require any additional libraries in > >> compare > >> >> to > >> >>>> those we have in the fabric build. It only lacks some of them as > Val > >> >>>> mentioned below. > >> >>>> > >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition and > >> simply > >> >>>> deliver hadoop jar and its configs as a part of the fabric? > >> >>>> > >> >>>> — > >> >>>> Denis > >> >>>> > >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan < > >> dsetrak...@apache.org> > >> >>>> wrote: > >> >>>>> > >> >>>>> Separate edition for the Hadoop Accelerator was primarily driven > by > >> the > >> >>>>> default libraries. Hadoop Accelerator requires many more libraries > >> as > >> >>>> well > >> >>>>> as configuration settings compared to the standard fabric > download. > >> >>>>> > >> >>>>> Now, as far as spark integration is concerned, I am not sure which > >> >>>> edition > >> >>>>> it belongs in, Hadoop Accelerator or standard fabric. > >> >>>>> > >> >>>>> D. > >> >>>>> > >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dma...@apache.org> > >> >> wrote: > >> >>>>> > >> >>>>>> *Dmitriy*, > >> >>>>>> > >> >>>>>> I do believe that you should know why the community decided to a > >> >>>> separate > >> >>>>>> edition for the Hadoop Accelerator. What was the reason for that? > >> >>>>>> Presently, as I see, it brings more confusion and difficulties > >> rather > >> >>>> then > >> >>>>>> benefit. > >> >>>>>> > >> >>>>>> — > >> >>>>>> Denis > >> >>>>>> > >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <c...@apache.org> > >> >> wrote: > >> >>>>>> > >> >>>>>> In fact I am very much agree with you. Right now, running the > >> >>>> "accelerator" > >> >>>>>> component in Bigtop disto gives one a pretty much complete fabric > >> >>>> anyway. > >> >>>>>> But > >> >>>>>> in order to make just an accelerator component we perform quite a > >> bit > >> >> of > >> >>>>>> woodoo magic during the packaging stage of the Bigtop build, > >> shuffling > >> >>>> jars > >> >>>>>> from here and there. And that's quite crazy, honestly ;) > >> >>>>>> > >> >>>>>> Cos > >> >>>>>> > >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote: > >> >>>>>> > >> >>>>>> I tend to agree with Denis. I see only these differences between > >> >> Hadoop > >> >>>>>> Accelerator and Fabric builds (correct me if I miss something): > >> >>>>>> > >> >>>>>> - Limited set of available modules and no optional modules in > >> Hadoop > >> >>>>>> Accelerator. > >> >>>>>> - No ignite-hadoop module in Fabric. > >> >>>>>> - Additional scripts, configs and instructions included in Hadoop > >> >>>>>> Accelerator. > >> >>>>>> > >> >>>>>> And the list of included modules frankly looks very weird. Here > are > >> >> only > >> >>>>>> some of the issues I noticed: > >> >>>>>> > >> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need > >> them > >> >>>>>> for Hadoop Acceleration (which I doubt), are they really required > >> or > >> >>>> can > >> >>>>>> be > >> >>>>>> optional? > >> >>>>>> - We force to use ignite-log4j module without providing other > >> logger > >> >>>>>> options (e.g., SLF). > >> >>>>>> - We don't include ignite-aws module. How to use Hadoop > Accelerator > >> >>>> with > >> >>>>>> S3 discovery? > >> >>>>>> - Etc. > >> >>>>>> > >> >>>>>> It seems to me that if we try to fix all this issue, there will > be > >> >>>>>> virtually no difference between Fabric and Hadoop Accelerator > >> builds > >> >>>> except > >> >>>>>> couple of scripts and config files. If so, there is no reason to > >> have > >> >>>> two > >> >>>>>> builds. > >> >>>>>> > >> >>>>>> -Val > >> >>>>>> > >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dma...@apache.org> > >> >> wrote: > >> >>>>>> > >> >>>>>> On the separate note, in the Bigtop, we start looking into > changing > >> >> the > >> >>>>>> > >> >>>>>> way we > >> >>>>>> > >> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data > >> >>>> fabric' > >> >>>>>> experience instead of the mere "hadoop-acceleration”. > >> >>>>>> > >> >>>>>> > >> >>>>>> And you still will be using hadoop-accelerator libs of Ignite, > >> right? > >> >>>>>> > >> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop > >> >> Accelerator > >> >>>> as > >> >>>>>> a separate delivery. > >> >>>>>> What if we start releasing the accelerator as a part of the > >> standard > >> >>>>>> fabric binary putting hadoop-accelerator libs under ‘optional’ > >> folder? > >> >>>>>> > >> >>>>>> — > >> >>>>>> Denis > >> >>>>>> > >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <c...@apache.org > > > >> >>>> wrote: > >> >>>>>> > >> >>>>>> What Denis said: spark has been added to the Hadoop accelerator > as > >> a > >> >> way > >> >>>>>> > >> >>>>>> to > >> >>>>>> > >> >>>>>> boost the performance of more than just MR compute of the Hadoop > >> >> stack, > >> >>>>>> > >> >>>>>> IIRC. > >> >>>>>> > >> >>>>>> For what it worth, Spark is considered a part of Hadoop at large. > >> >>>>>> > >> >>>>>> On the separate note, in the Bigtop, we start looking into > changing > >> >> the > >> >>>>>> > >> >>>>>> way we > >> >>>>>> > >> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data > >> >>>> fabric' > >> >>>>>> experience instead of the mere "hadoop-acceleration". > >> >>>>>> > >> >>>>>> Cos > >> >>>>>> > >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote: > >> >>>>>> > >> >>>>>> Val, > >> >>>>>> > >> >>>>>> Ignite Hadoop module includes not only the map-reduce accelerator > >> but > >> >>>>>> > >> >>>>>> Ignite > >> >>>>>> > >> >>>>>> Hadoop File System component as well. The latter can be used in > >> >>>>>> > >> >>>>>> deployments > >> >>>>>> > >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark. > >> >>>>>> > >> >>>>>> Considering this I’m for the second solution proposed by you: put > >> both > >> >>>>>> > >> >>>>>> 2.10 > >> >>>>>> > >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite > >> Hadoop > >> >>>>>> Accelerator distribution. > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 < > >> >>>>>> > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254> > >> >>>>>> > >> >>>>>> > >> >>>>>> BTW, this task may be affected or related to the following ones: > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 < > >> >>>>>> > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596> > >> >>>>>> > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822 > >> >>>>>> > >> >>>>>> — > >> >>>>>> Denis > >> >>>>>> > >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko < > >> >>>>>> > >> >>>>>> valentin.kuliche...@gmail.com> wrote: > >> >>>>>> > >> >>>>>> > >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used > by > >> >>>>>> > >> >>>>>> Hadoop > >> >>>>>> > >> >>>>>> when running its jobs. ignite-spark module only provides > IgniteRDD > >> >>>>>> > >> >>>>>> which > >> >>>>>> > >> >>>>>> Hadoop obviously will never use. > >> >>>>>> > >> >>>>>> Is there another use case for Hadoop Accelerator which I'm > missing? > >> >>>>>> > >> >>>>>> -Val > >> >>>>>> > >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan < > >> >>>>>> > >> >>>>>> dsetrak...@apache.org> > >> >>>>>> > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>> Why do you think that spark module is not needed in our hadoop > >> build? > >> >>>>>> > >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko < > >> >>>>>> valentin.kuliche...@gmail.com> wrote: > >> >>>>>> > >> >>>>>> Folks, > >> >>>>>> > >> >>>>>> Is there anyone who understands the purpose of including > >> ignite-spark > >> >>>>>> module in the Hadoop Accelerator build? I can't figure out a use > >> >>>>>> > >> >>>>>> case for > >> >>>>>> > >> >>>>>> which it's needed. > >> >>>>>> > >> >>>>>> In case we actually need it there, there is an issue then. We > >> >>>>>> > >> >>>>>> actually > >> >>>>>> > >> >>>>>> have > >> >>>>>> > >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build > >> >>>>>> > >> >>>>>> everything > >> >>>>>> > >> >>>>>> is > >> >>>>>> > >> >>>>>> good, we put both in 'optional' folder and user can enable either > >> >>>>>> > >> >>>>>> one. > >> >>>>>> > >> >>>>>> But > >> >>>>>> > >> >>>>>> in Hadoop Accelerator there is only 2.11 which means that the > build > >> >>>>>> > >> >>>>>> doesn't > >> >>>>>> > >> >>>>>> work with 2.10 out of the box. > >> >>>>>> > >> >>>>>> We should either remove the module from the build, or fix the > >> issue. > >> >>>>>> > >> >>>>>> -Val > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>> > >> >>>> > >> >>> > >> >>> > >> >>> -- > >> >>> Sergey Kozlov > >> >>> GridGain Systems > >> >>> www.gridgain.com > >> >> > >> >> > >> > >> > > > > > -- > Vladimir Ozerov > Senior Software Architect > GridGain Systems > www.gridgain.com > *+7 (960) 283 98 40* > -- Sergey Kozlov GridGain Systems www.gridgain.com