Another point is that hadoop edition has no optional modules. It forces user to download the fabric edition and copy module from there.
On Thu, Dec 8, 2016 at 12:19 PM, Vladimir Ozerov <voze...@gridgain.com> wrote: > Work for ourselves - is to maintain two separate editions, while everything > can be easily merged into a single distribution. > > On Wed, Dec 7, 2016 at 3:29 AM, Dmitriy Setrakyan <dsetrak...@apache.org> > wrote: > > > Why are we creating work for ourselves? What is wrong with having 2 > > downloads? > > > > Hadoop accelerator edition exists for the following 2 purposes only: > > > > - accelerate HDFS with Ignite In-Memory File System (IGFS) > > - accelerate Hadoop MapReduce with Ignite In-Memory MapReduce > > > > I agree with the original email from Valentin that Spark libs should not > be > > included into hadoop-accelerator download. Spark integration is not part > of > > Ignite Hadoop Accelerator and should be included only into the Ignite > > fabric download. > > > > D. > > > > > > > > On Tue, Dec 6, 2016 at 12:30 AM, Sergey Kozlov <skoz...@gridgain.com> > > wrote: > > > > > Hi > > > > > > In general I agree with Vladimir but would suggest more technical > > details: > > > > > > Due the need to collect particular CLASS_PATHs for fabric and hadoop > > > editions we can change the logic of processing of libs directory > > > > > > 1. Introduce libs/hadoop and libs/fabric directories. These directories > > are > > > root directories for specific modules for hadoop and fabric > > > editions respectively > > > 2. Change collecting of directories for CLASS_PATH for ignite.sh: > > > - collect everything for libs except libs/hadoop > > > - collect everything from libs/fabric > > > 3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may make > > initial > > > setup instead of setup-hadoop.sh) that constructs CLASS_PATH by > following > > > way: > > > - collect everything for libs except libs/fabirc > > > - collect everything from libs/hadoop > > > > > > This approach allows us following: > > > - share common modules across both editions (just put in libs) > > > - do not share edition-specific modules (either put in libs/hadoop or > in > > > libs/fabric) > > > > > > > > > > > > > > > On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <voze...@gridgain.com > > > > > wrote: > > > > > > > Agree. I do not see any reasons to have two different products. > > Instead, > > > > just add ignite-hadoop.jar to distribution, and add separate script > to > > > > start Accelerator. We can go the same way as we did for "platforms": > > > create > > > > separate top-level folder "hadoop" in Fabric distribution and put all > > > > realted Hadoop Acceleratro stuff there. > > > > > > > > On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko < > > > > valentin.kuliche...@gmail.com> wrote: > > > > > > > > > In general, I don't quite understand why we should move any > component > > > > > outside of Fabric. The concept of Fabric is to have everything, no? > > :) > > > In > > > > > other words, if a cluster was once setup for Hadoop Acceleration, > why > > > not > > > > > allow to create a cache and/or run a task using native Ignite APIs > > > > sometime > > > > > later. We follow this approach with all our components and modules, > > but > > > > not > > > > > with ignite-hadoop for some reason. > > > > > > > > > > If we get rid of Hadoop Accelerator build, initial setup of Hadoop > > > > > integration can potentially become a bit more complicated, but with > > > > proper > > > > > documentation I don't think this is going to be a problem, because > it > > > > > requires multiple steps now anyway. And frankly the same can be > said > > > > about > > > > > any optional module we have - enabling it requires some additional > > > steps > > > > as > > > > > it doesn't work out of the box. > > > > > > > > > > -Val > > > > > > > > > > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dma...@apache.org> > > > wrote: > > > > > > > > > >> Dmitriy, > > > > >> > > > > >> > - the "lib/" folder has much fewer libraries that in fabric, > > > simply > > > > >> > becomes many dependencies don't make sense for hadoop > > environment > > > > >> > > > > >> This reason why the discussion moved to this direction is exactly > in > > > > that. > > > > >> > > > > >> How do we decide what should be a part of Hadoop Accelerator and > > what > > > > >> should be excluded? If you read through Val and Cos comments below > > > > you’ll > > > > >> get more insights. > > > > >> > > > > >> In general, we need to have a clear understanding on what's Hadoop > > > > >> Accelerator distribution use case. This will help us to come up > > with a > > > > >> final decision. > > > > >> > > > > >> If the accelerator is supposed to be plugged-in into an existed > > Hadoop > > > > >> environment by enabling MapReduce and/IGFS at the configuration > > level > > > > then > > > > >> we should simply remove ignite-indexing, ignite-spark modules and > > add > > > > >> additional logging libs as well as AWS, GCE integrations’ > packages. > > > > >> > > > > >> But, wait, what if a user wants to leverage from Ignite Spark > > > > >> Integration, Ignite SQL or Geospatial queries, Ignite streaming > > > > >> capabilities after he has already plugged-in the accelerator. What > > if > > > > he is > > > > >> ready to modify his existed code. He can’t simply switch to the > > fabric > > > > on > > > > >> an application side because the fabric doesn’t include > accelerator’s > > > > libs > > > > >> that are still needed. He can’t solely rely on the accelerator > > > > distribution > > > > >> as well which misses some libs. And, obviously, the user starts > > > > shuffling > > > > >> libs in between the fabric and accelerator to get what is > required. > > > > >> > > > > >> Vladimir, can you share your thoughts on this? > > > > >> > > > > >> — > > > > >> Denis > > > > >> > > > > >> > > > > >> > > > > >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan < > > > > dsetrak...@apache.org> > > > > >> wrote: > > > > >> > > > > > >> > Guys, > > > > >> > > > > > >> > I just downloaded the hadoop accelerator and here are the > > > differences > > > > >> from > > > > >> > the fabric edition that jump at me right away: > > > > >> > > > > > >> > - the "bin/" folder has "setup-hadoop" scripts > > > > >> > - the "config/" folder has "hadoop" subfolder with necessary > > > > >> > hadoop-related configuration > > > > >> > - the "lib/" folder has much fewer libraries that in fabric, > > > simply > > > > >> > becomes many dependencies don't make sense for hadoop > > environment > > > > >> > > > > > >> > I currently don't see how we can merge the hadoop accelerator > with > > > > >> standard > > > > >> > fabric edition. > > > > >> > > > > > >> > D. > > > > >> > > > > > >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dma...@apache.org> > > > > wrote: > > > > >> > > > > > >> >> Vovan, > > > > >> >> > > > > >> >> As one of hadoop maintainers, please share your point of view > on > > > > this. > > > > >> >> > > > > >> >> — > > > > >> >> Denis > > > > >> >> > > > > >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov < > > skoz...@gridgain.com > > > > > > > > >> >> wrote: > > > > >> >>> > > > > >> >>> Denis > > > > >> >>> > > > > >> >>> I agree that at the moment there's no reason to split into > > fabric > > > > and > > > > >> >>> hadoop editions. > > > > >> >>> > > > > >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda < > dma...@apache.org> > > > > >> wrote: > > > > >> >>> > > > > >> >>>> Hadoop Accelerator doesn’t require any additional libraries > in > > > > >> compare > > > > >> >> to > > > > >> >>>> those we have in the fabric build. It only lacks some of them > > as > > > > Val > > > > >> >>>> mentioned below. > > > > >> >>>> > > > > >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition > > and > > > > >> simply > > > > >> >>>> deliver hadoop jar and its configs as a part of the fabric? > > > > >> >>>> > > > > >> >>>> — > > > > >> >>>> Denis > > > > >> >>>> > > > > >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan < > > > > >> dsetrak...@apache.org> > > > > >> >>>> wrote: > > > > >> >>>>> > > > > >> >>>>> Separate edition for the Hadoop Accelerator was primarily > > driven > > > > by > > > > >> the > > > > >> >>>>> default libraries. Hadoop Accelerator requires many more > > > libraries > > > > >> as > > > > >> >>>> well > > > > >> >>>>> as configuration settings compared to the standard fabric > > > > download. > > > > >> >>>>> > > > > >> >>>>> Now, as far as spark integration is concerned, I am not sure > > > which > > > > >> >>>> edition > > > > >> >>>>> it belongs in, Hadoop Accelerator or standard fabric. > > > > >> >>>>> > > > > >> >>>>> D. > > > > >> >>>>> > > > > >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda < > > dma...@apache.org > > > > > > > > >> >> wrote: > > > > >> >>>>> > > > > >> >>>>>> *Dmitriy*, > > > > >> >>>>>> > > > > >> >>>>>> I do believe that you should know why the community decided > > to > > > a > > > > >> >>>> separate > > > > >> >>>>>> edition for the Hadoop Accelerator. What was the reason for > > > that? > > > > >> >>>>>> Presently, as I see, it brings more confusion and > > difficulties > > > > >> rather > > > > >> >>>> then > > > > >> >>>>>> benefit. > > > > >> >>>>>> > > > > >> >>>>>> — > > > > >> >>>>>> Denis > > > > >> >>>>>> > > > > >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik < > > > c...@apache.org> > > > > >> >> wrote: > > > > >> >>>>>> > > > > >> >>>>>> In fact I am very much agree with you. Right now, running > the > > > > >> >>>> "accelerator" > > > > >> >>>>>> component in Bigtop disto gives one a pretty much complete > > > fabric > > > > >> >>>> anyway. > > > > >> >>>>>> But > > > > >> >>>>>> in order to make just an accelerator component we perform > > > quite a > > > > >> bit > > > > >> >> of > > > > >> >>>>>> woodoo magic during the packaging stage of the Bigtop > build, > > > > >> shuffling > > > > >> >>>> jars > > > > >> >>>>>> from here and there. And that's quite crazy, honestly ;) > > > > >> >>>>>> > > > > >> >>>>>> Cos > > > > >> >>>>>> > > > > >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote: > > > > >> >>>>>> > > > > >> >>>>>> I tend to agree with Denis. I see only these differences > > > between > > > > >> >> Hadoop > > > > >> >>>>>> Accelerator and Fabric builds (correct me if I miss > > something): > > > > >> >>>>>> > > > > >> >>>>>> - Limited set of available modules and no optional modules > in > > > > >> Hadoop > > > > >> >>>>>> Accelerator. > > > > >> >>>>>> - No ignite-hadoop module in Fabric. > > > > >> >>>>>> - Additional scripts, configs and instructions included in > > > Hadoop > > > > >> >>>>>> Accelerator. > > > > >> >>>>>> > > > > >> >>>>>> And the list of included modules frankly looks very weird. > > Here > > > > are > > > > >> >> only > > > > >> >>>>>> some of the issues I noticed: > > > > >> >>>>>> > > > > >> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if > we > > > need > > > > >> them > > > > >> >>>>>> for Hadoop Acceleration (which I doubt), are they really > > > required > > > > >> or > > > > >> >>>> can > > > > >> >>>>>> be > > > > >> >>>>>> optional? > > > > >> >>>>>> - We force to use ignite-log4j module without providing > other > > > > >> logger > > > > >> >>>>>> options (e.g., SLF). > > > > >> >>>>>> - We don't include ignite-aws module. How to use Hadoop > > > > Accelerator > > > > >> >>>> with > > > > >> >>>>>> S3 discovery? > > > > >> >>>>>> - Etc. > > > > >> >>>>>> > > > > >> >>>>>> It seems to me that if we try to fix all this issue, there > > will > > > > be > > > > >> >>>>>> virtually no difference between Fabric and Hadoop > Accelerator > > > > >> builds > > > > >> >>>> except > > > > >> >>>>>> couple of scripts and config files. If so, there is no > reason > > > to > > > > >> have > > > > >> >>>> two > > > > >> >>>>>> builds. > > > > >> >>>>>> > > > > >> >>>>>> -Val > > > > >> >>>>>> > > > > >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda < > > > dma...@apache.org> > > > > >> >> wrote: > > > > >> >>>>>> > > > > >> >>>>>> On the separate note, in the Bigtop, we start looking into > > > > changing > > > > >> >> the > > > > >> >>>>>> > > > > >> >>>>>> way we > > > > >> >>>>>> > > > > >> >>>>>> deliver Ignite and we'll likely to start offering the whole > > > 'data > > > > >> >>>> fabric' > > > > >> >>>>>> experience instead of the mere "hadoop-acceleration”. > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>>> And you still will be using hadoop-accelerator libs of > > Ignite, > > > > >> right? > > > > >> >>>>>> > > > > >> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop > > > > >> >> Accelerator > > > > >> >>>> as > > > > >> >>>>>> a separate delivery. > > > > >> >>>>>> What if we start releasing the accelerator as a part of the > > > > >> standard > > > > >> >>>>>> fabric binary putting hadoop-accelerator libs under > > ‘optional’ > > > > >> folder? > > > > >> >>>>>> > > > > >> >>>>>> — > > > > >> >>>>>> Denis > > > > >> >>>>>> > > > > >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik < > > > c...@apache.org > > > > > > > > > >> >>>> wrote: > > > > >> >>>>>> > > > > >> >>>>>> What Denis said: spark has been added to the Hadoop > > accelerator > > > > as > > > > >> a > > > > >> >> way > > > > >> >>>>>> > > > > >> >>>>>> to > > > > >> >>>>>> > > > > >> >>>>>> boost the performance of more than just MR compute of the > > > Hadoop > > > > >> >> stack, > > > > >> >>>>>> > > > > >> >>>>>> IIRC. > > > > >> >>>>>> > > > > >> >>>>>> For what it worth, Spark is considered a part of Hadoop at > > > large. > > > > >> >>>>>> > > > > >> >>>>>> On the separate note, in the Bigtop, we start looking into > > > > changing > > > > >> >> the > > > > >> >>>>>> > > > > >> >>>>>> way we > > > > >> >>>>>> > > > > >> >>>>>> deliver Ignite and we'll likely to start offering the whole > > > 'data > > > > >> >>>> fabric' > > > > >> >>>>>> experience instead of the mere "hadoop-acceleration". > > > > >> >>>>>> > > > > >> >>>>>> Cos > > > > >> >>>>>> > > > > >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote: > > > > >> >>>>>> > > > > >> >>>>>> Val, > > > > >> >>>>>> > > > > >> >>>>>> Ignite Hadoop module includes not only the map-reduce > > > accelerator > > > > >> but > > > > >> >>>>>> > > > > >> >>>>>> Ignite > > > > >> >>>>>> > > > > >> >>>>>> Hadoop File System component as well. The latter can be > used > > in > > > > >> >>>>>> > > > > >> >>>>>> deployments > > > > >> >>>>>> > > > > >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark. > > > > >> >>>>>> > > > > >> >>>>>> Considering this I’m for the second solution proposed by > you: > > > put > > > > >> both > > > > >> >>>>>> > > > > >> >>>>>> 2.10 > > > > >> >>>>>> > > > > >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of > > Ignite > > > > >> Hadoop > > > > >> >>>>>> Accelerator distribution. > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 < > > > > >> >>>>>> > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254> > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>>> BTW, this task may be affected or related to the following > > > ones: > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 < > > > > >> >>>>>> > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596> > > > > >> >>>>>> > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822 > > > > >> >>>>>> > > > > >> >>>>>> — > > > > >> >>>>>> Denis > > > > >> >>>>>> > > > > >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko < > > > > >> >>>>>> > > > > >> >>>>>> valentin.kuliche...@gmail.com> wrote: > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is > > > used > > > > by > > > > >> >>>>>> > > > > >> >>>>>> Hadoop > > > > >> >>>>>> > > > > >> >>>>>> when running its jobs. ignite-spark module only provides > > > > IgniteRDD > > > > >> >>>>>> > > > > >> >>>>>> which > > > > >> >>>>>> > > > > >> >>>>>> Hadoop obviously will never use. > > > > >> >>>>>> > > > > >> >>>>>> Is there another use case for Hadoop Accelerator which I'm > > > > missing? > > > > >> >>>>>> > > > > >> >>>>>> -Val > > > > >> >>>>>> > > > > >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan < > > > > >> >>>>>> > > > > >> >>>>>> dsetrak...@apache.org> > > > > >> >>>>>> > > > > >> >>>>>> wrote: > > > > >> >>>>>> > > > > >> >>>>>> Why do you think that spark module is not needed in our > > hadoop > > > > >> build? > > > > >> >>>>>> > > > > >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko < > > > > >> >>>>>> valentin.kuliche...@gmail.com> wrote: > > > > >> >>>>>> > > > > >> >>>>>> Folks, > > > > >> >>>>>> > > > > >> >>>>>> Is there anyone who understands the purpose of including > > > > >> ignite-spark > > > > >> >>>>>> module in the Hadoop Accelerator build? I can't figure out > a > > > use > > > > >> >>>>>> > > > > >> >>>>>> case for > > > > >> >>>>>> > > > > >> >>>>>> which it's needed. > > > > >> >>>>>> > > > > >> >>>>>> In case we actually need it there, there is an issue then. > We > > > > >> >>>>>> > > > > >> >>>>>> actually > > > > >> >>>>>> > > > > >> >>>>>> have > > > > >> >>>>>> > > > > >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric > build > > > > >> >>>>>> > > > > >> >>>>>> everything > > > > >> >>>>>> > > > > >> >>>>>> is > > > > >> >>>>>> > > > > >> >>>>>> good, we put both in 'optional' folder and user can enable > > > either > > > > >> >>>>>> > > > > >> >>>>>> one. > > > > >> >>>>>> > > > > >> >>>>>> But > > > > >> >>>>>> > > > > >> >>>>>> in Hadoop Accelerator there is only 2.11 which means that > the > > > > build > > > > >> >>>>>> > > > > >> >>>>>> doesn't > > > > >> >>>>>> > > > > >> >>>>>> work with 2.10 out of the box. > > > > >> >>>>>> > > > > >> >>>>>> We should either remove the module from the build, or fix > the > > > > >> issue. > > > > >> >>>>>> > > > > >> >>>>>> -Val > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>> > > > > >> >>>> > > > > >> >>> > > > > >> >>> > > > > >> >>> -- > > > > >> >>> Sergey Kozlov > > > > >> >>> GridGain Systems > > > > >> >>> www.gridgain.com > > > > >> >> > > > > >> >> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > -- > > > > Vladimir Ozerov > > > > Senior Software Architect > > > > GridGain Systems > > > > www.gridgain.com > > > > *+7 (960) 283 98 40* > > > > > > > > > > > > > > > > -- > > > Sergey Kozlov > > > GridGain Systems > > > www.gridgain.com > > > > > > > > > -- > Vladimir Ozerov > Senior Software Architect > GridGain Systems > www.gridgain.com > *+7 (960) 283 98 40* > -- Sergey Kozlov GridGain Systems www.gridgain.com