Re: ignite-spark module in Hadoop Accelerator

Sergey Kozlov Tue, 06 Dec 2016 00:31:32 -0800

Hi

In general I agree with Vladimir but would suggest more technical details:


Due the need to collect particular CLASS_PATHs for fabric and hadoop
editions we can change the logic of processing of libs directory

1. Introduce libs/hadoop and libs/fabric directories. These directories are
root directories for specific modules for hadoop and fabric
editions respectively
2. Change collecting of directories for CLASS_PATH for ignite.sh:
 - collect everything for libs except libs/hadoop
 - collect everything from libs/fabric
3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may make initial
setup instead of setup-hadoop.sh) that constructs CLASS_PATH by following
way:
 - collect everything for libs except libs/fabirc
 - collect everything from libs/hadoop

This approach allows us following:
 - share common modules across both editions (just put in libs)
 - do not share edition-specific modules (either put in libs/hadoop or in
libs/fabric)




On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <[email protected]>
wrote:

> Agree. I do not see any reasons to have two different products. Instead,
> just add ignite-hadoop.jar to distribution, and add separate script to
> start Accelerator. We can go the same way as we did for "platforms": create
> separate top-level folder "hadoop" in Fabric distribution and put all
> realted Hadoop Acceleratro stuff there.
>
> On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
> [email protected]> wrote:
>
> > In general, I don't quite understand why we should move any component
> > outside of Fabric. The concept of Fabric is to have everything, no? :) In
> > other words, if a cluster was once setup for Hadoop Acceleration, why not
> > allow to create a cache and/or run a task using native Ignite APIs
> sometime
> > later. We follow this approach with all our components and modules, but
> not
> > with ignite-hadoop for some reason.
> >
> > If we get rid of Hadoop Accelerator build, initial setup of Hadoop
> > integration can potentially become a bit more complicated, but with
> proper
> > documentation I don't think this is going to be a problem, because it
> > requires multiple steps now anyway. And frankly the same can be said
> about
> > any optional module we have - enabling it requires some additional steps
> as
> > it doesn't work out of the box.
> >
> > -Val
> >
> > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <[email protected]> wrote:
> >
> >> Dmitriy,
> >>
> >> >   - the "lib/" folder has much fewer libraries that in fabric, simply
> >> >   becomes many dependencies don't make sense for hadoop environment
> >>
> >> This reason why the discussion moved to this direction is exactly in
> that.
> >>
> >> How do we decide what should be a part of Hadoop Accelerator and what
> >> should be excluded? If you read through Val and Cos comments below
> you’ll
> >> get more insights.
> >>
> >> In general, we need to have a clear understanding on what's Hadoop
> >> Accelerator distribution use case. This will help us to come up with a
> >> final decision.
> >>
> >> If the accelerator is supposed to be plugged-in into an existed Hadoop
> >> environment by enabling MapReduce and/IGFS at the configuration level
> then
> >> we should simply remove ignite-indexing, ignite-spark modules and add
> >> additional logging libs as well as AWS, GCE integrations’ packages.
> >>
> >> But, wait, what if a user wants to leverage from Ignite Spark
> >> Integration, Ignite SQL or Geospatial queries, Ignite streaming
> >> capabilities after he has already plugged-in the accelerator. What if
> he is
> >> ready to modify his existed code. He can’t simply switch to the fabric
> on
> >> an application side because the fabric doesn’t include accelerator’s
> libs
> >> that are still needed. He can’t solely rely on the accelerator
> distribution
> >> as well which misses some libs. And, obviously, the user starts
> shuffling
> >> libs in between the fabric and accelerator to get what is required.
> >>
> >> Vladimir, can you share your thoughts on this?
> >>
> >> —
> >> Denis
> >>
> >>
> >>
> >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <
> [email protected]>
> >> wrote:
> >> >
> >> > Guys,
> >> >
> >> > I just downloaded the hadoop accelerator and here are the differences
> >> from
> >> > the fabric edition that jump at me right away:
> >> >
> >> >   - the "bin/" folder has "setup-hadoop" scripts
> >> >   - the "config/" folder has "hadoop" subfolder with necessary
> >> >   hadoop-related configuration
> >> >   - the "lib/" folder has much fewer libraries that in fabric, simply
> >> >   becomes many dependencies don't make sense for hadoop environment
> >> >
> >> > I currently don't see how we can merge the hadoop accelerator with
> >> standard
> >> > fabric edition.
> >> >
> >> > D.
> >> >
> >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <[email protected]>
> wrote:
> >> >
> >> >> Vovan,
> >> >>
> >> >> As one of hadoop maintainers, please share your point of view on
> this.
> >> >>
> >> >> —
> >> >> Denis
> >> >>
> >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <[email protected]>
> >> >> wrote:
> >> >>>
> >> >>> Denis
> >> >>>
> >> >>> I agree that at the moment there's no reason to split into fabric
> and
> >> >>> hadoop editions.
> >> >>>
> >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <[email protected]>
> >> wrote:
> >> >>>
> >> >>>> Hadoop Accelerator doesn’t require any additional libraries in
> >> compare
> >> >> to
> >> >>>> those we have in the fabric build. It only lacks some of them as
> Val
> >> >>>> mentioned below.
> >> >>>>
> >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition and
> >> simply
> >> >>>> deliver hadoop jar and its configs as a part of the fabric?
> >> >>>>
> >> >>>> —
> >> >>>> Denis
> >> >>>>
> >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
> >> [email protected]>
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> Separate edition for the Hadoop Accelerator was primarily driven
> by
> >> the
> >> >>>>> default libraries. Hadoop Accelerator requires many more libraries
> >> as
> >> >>>> well
> >> >>>>> as configuration settings compared to the standard fabric
> download.
> >> >>>>>
> >> >>>>> Now, as far as spark integration is concerned, I am not sure which
> >> >>>> edition
> >> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
> >> >>>>>
> >> >>>>> D.
> >> >>>>>
> >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <[email protected]>
> >> >> wrote:
> >> >>>>>
> >> >>>>>> *Dmitriy*,
> >> >>>>>>
> >> >>>>>> I do believe that you should know why the community decided to a
> >> >>>> separate
> >> >>>>>> edition for the Hadoop Accelerator. What was the reason for that?
> >> >>>>>> Presently, as I see, it brings more confusion and difficulties
> >> rather
> >> >>>> then
> >> >>>>>> benefit.
> >> >>>>>>
> >> >>>>>> —
> >> >>>>>> Denis
> >> >>>>>>
> >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <[email protected]>
> >> >> wrote:
> >> >>>>>>
> >> >>>>>> In fact I am very much agree with you. Right now, running the
> >> >>>> "accelerator"
> >> >>>>>> component in Bigtop disto gives one a pretty much complete fabric
> >> >>>> anyway.
> >> >>>>>> But
> >> >>>>>> in order to make just an accelerator component we perform quite a
> >> bit
> >> >> of
> >> >>>>>> woodoo magic during the packaging stage of the Bigtop build,
> >> shuffling
> >> >>>> jars
> >> >>>>>> from here and there. And that's quite crazy, honestly ;)
> >> >>>>>>
> >> >>>>>> Cos
> >> >>>>>>
> >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> >> >>>>>>
> >> >>>>>> I tend to agree with Denis. I see only these differences between
> >> >> Hadoop
> >> >>>>>> Accelerator and Fabric builds (correct me if I miss something):
> >> >>>>>>
> >> >>>>>> - Limited set of available modules and no optional modules in
> >> Hadoop
> >> >>>>>> Accelerator.
> >> >>>>>> - No ignite-hadoop module in Fabric.
> >> >>>>>> - Additional scripts, configs and instructions included in Hadoop
> >> >>>>>> Accelerator.
> >> >>>>>>
> >> >>>>>> And the list of included modules frankly looks very weird. Here
> are
> >> >> only
> >> >>>>>> some of the issues I noticed:
> >> >>>>>>
> >> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need
> >> them
> >> >>>>>> for Hadoop Acceleration (which I doubt), are they really required
> >> or
> >> >>>> can
> >> >>>>>> be
> >> >>>>>> optional?
> >> >>>>>> - We force to use ignite-log4j module without providing other
> >> logger
> >> >>>>>> options (e.g., SLF).
> >> >>>>>> - We don't include ignite-aws module. How to use Hadoop
> Accelerator
> >> >>>> with
> >> >>>>>> S3 discovery?
> >> >>>>>> - Etc.
> >> >>>>>>
> >> >>>>>> It seems to me that if we try to fix all this issue, there will
> be
> >> >>>>>> virtually no difference between Fabric and Hadoop Accelerator
> >> builds
> >> >>>> except
> >> >>>>>> couple of scripts and config files. If so, there is no reason to
> >> have
> >> >>>> two
> >> >>>>>> builds.
> >> >>>>>>
> >> >>>>>> -Val
> >> >>>>>>
> >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <[email protected]>
> >> >> wrote:
> >> >>>>>>
> >> >>>>>> On the separate note, in the Bigtop, we start looking into
> changing
> >> >> the
> >> >>>>>>
> >> >>>>>> way we
> >> >>>>>>
> >> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
> >> >>>> fabric'
> >> >>>>>> experience instead of the mere "hadoop-acceleration”.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> And you still will be using hadoop-accelerator libs of Ignite,
> >> right?
> >> >>>>>>
> >> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop
> >> >> Accelerator
> >> >>>> as
> >> >>>>>> a separate delivery.
> >> >>>>>> What if we start releasing the accelerator as a part of the
> >> standard
> >> >>>>>> fabric binary putting hadoop-accelerator libs under ‘optional’
> >> folder?
> >> >>>>>>
> >> >>>>>> —
> >> >>>>>> Denis
> >> >>>>>>
> >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <[email protected]
> >
> >> >>>> wrote:
> >> >>>>>>
> >> >>>>>> What Denis said: spark has been added to the Hadoop accelerator
> as
> >> a
> >> >> way
> >> >>>>>>
> >> >>>>>> to
> >> >>>>>>
> >> >>>>>> boost the performance of more than just MR compute of the Hadoop
> >> >> stack,
> >> >>>>>>
> >> >>>>>> IIRC.
> >> >>>>>>
> >> >>>>>> For what it worth, Spark is considered a part of Hadoop at large.
> >> >>>>>>
> >> >>>>>> On the separate note, in the Bigtop, we start looking into
> changing
> >> >> the
> >> >>>>>>
> >> >>>>>> way we
> >> >>>>>>
> >> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
> >> >>>> fabric'
> >> >>>>>> experience instead of the mere "hadoop-acceleration".
> >> >>>>>>
> >> >>>>>> Cos
> >> >>>>>>
> >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> >> >>>>>>
> >> >>>>>> Val,
> >> >>>>>>
> >> >>>>>> Ignite Hadoop module includes not only the map-reduce accelerator
> >> but
> >> >>>>>>
> >> >>>>>> Ignite
> >> >>>>>>
> >> >>>>>> Hadoop File System component as well. The latter can be used in
> >> >>>>>>
> >> >>>>>> deployments
> >> >>>>>>
> >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> >> >>>>>>
> >> >>>>>> Considering this I’m for the second solution proposed by you: put
> >> both
> >> >>>>>>
> >> >>>>>> 2.10
> >> >>>>>>
> >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite
> >> Hadoop
> >> >>>>>> Accelerator distribution.
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> >> >>>>>>
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> BTW, this task may be affected or related to the following ones:
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> >> >>>>>>
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> >> >>>>>>
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> >> >>>>>>
> >> >>>>>> —
> >> >>>>>> Denis
> >> >>>>>>
> >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> >> >>>>>>
> >> >>>>>> [email protected]> wrote:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used
> by
> >> >>>>>>
> >> >>>>>> Hadoop
> >> >>>>>>
> >> >>>>>> when running its jobs. ignite-spark module only provides
> IgniteRDD
> >> >>>>>>
> >> >>>>>> which
> >> >>>>>>
> >> >>>>>> Hadoop obviously will never use.
> >> >>>>>>
> >> >>>>>> Is there another use case for Hadoop Accelerator which I'm
> missing?
> >> >>>>>>
> >> >>>>>> -Val
> >> >>>>>>
> >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> >> >>>>>>
> >> >>>>>> [email protected]>
> >> >>>>>>
> >> >>>>>> wrote:
> >> >>>>>>
> >> >>>>>> Why do you think that spark module is not needed in our hadoop
> >> build?
> >> >>>>>>
> >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> >> >>>>>> [email protected]> wrote:
> >> >>>>>>
> >> >>>>>> Folks,
> >> >>>>>>
> >> >>>>>> Is there anyone who understands the purpose of including
> >> ignite-spark
> >> >>>>>> module in the Hadoop Accelerator build? I can't figure out a use
> >> >>>>>>
> >> >>>>>> case for
> >> >>>>>>
> >> >>>>>> which it's needed.
> >> >>>>>>
> >> >>>>>> In case we actually need it there, there is an issue then. We
> >> >>>>>>
> >> >>>>>> actually
> >> >>>>>>
> >> >>>>>> have
> >> >>>>>>
> >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
> >> >>>>>>
> >> >>>>>> everything
> >> >>>>>>
> >> >>>>>> is
> >> >>>>>>
> >> >>>>>> good, we put both in 'optional' folder and user can enable either
> >> >>>>>>
> >> >>>>>> one.
> >> >>>>>>
> >> >>>>>> But
> >> >>>>>>
> >> >>>>>> in Hadoop Accelerator there is only 2.11 which means that the
> build
> >> >>>>>>
> >> >>>>>> doesn't
> >> >>>>>>
> >> >>>>>> work with 2.10 out of the box.
> >> >>>>>>
> >> >>>>>> We should either remove the module from the build, or fix the
> >> issue.
> >> >>>>>>
> >> >>>>>> -Val
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Sergey Kozlov
> >> >>> GridGain Systems
> >> >>> www.gridgain.com
> >> >>
> >> >>
> >>
> >>
> >
>
>
> --
> Vladimir Ozerov
> Senior Software Architect
> GridGain Systems
> www.gridgain.com
> *+7 (960) 283 98 40*
>



-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Re: ignite-spark module in Hadoop Accelerator

Reply via email to