Re: ignite-spark module in Hadoop Accelerator

Dmitriy Setrakyan Wed, 30 Nov 2016 23:19:48 -0800

Guys,

I just downloaded the hadoop accelerator and here are the differences from
the fabric edition that jump at me right away:


   - the "bin/" folder has "setup-hadoop" scripts
   - the "config/" folder has "hadoop" subfolder with necessary
   hadoop-related configuration
   - the "lib/" folder has much fewer libraries that in fabric, simply
   becomes many dependencies don't make sense for hadoop environment

I currently don't see how we can merge the hadoop accelerator with standard
fabric edition.

D.

On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <[email protected]> wrote:

> Vovan,
>
> As one of hadoop maintainers, please share your point of view on this.
>
> —
> Denis
>
> > On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <[email protected]>
> wrote:
> >
> > Denis
> >
> > I agree that at the moment there's no reason to split into fabric and
> > hadoop editions.
> >
> > On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <[email protected]> wrote:
> >
> >> Hadoop Accelerator doesn’t require any additional libraries in compare
> to
> >> those we have in the fabric build. It only lacks some of them as Val
> >> mentioned below.
> >>
> >> Wouldn’t it better to discontinue Hadoop Accelerator edition and simply
> >> deliver hadoop jar and its configs as a part of the fabric?
> >>
> >> —
> >> Denis
> >>
> >>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <[email protected]>
> >> wrote:
> >>>
> >>> Separate edition for the Hadoop Accelerator was primarily driven by the
> >>> default libraries. Hadoop Accelerator requires many more libraries as
> >> well
> >>> as configuration settings compared to the standard fabric download.
> >>>
> >>> Now, as far as spark integration is concerned, I am not sure which
> >> edition
> >>> it belongs in, Hadoop Accelerator or standard fabric.
> >>>
> >>> D.
> >>>
> >>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <[email protected]>
> wrote:
> >>>
> >>>> *Dmitriy*,
> >>>>
> >>>> I do believe that you should know why the community decided to a
> >> separate
> >>>> edition for the Hadoop Accelerator. What was the reason for that?
> >>>> Presently, as I see, it brings more confusion and difficulties rather
> >> then
> >>>> benefit.
> >>>>
> >>>> —
> >>>> Denis
> >>>>
> >>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <[email protected]>
> wrote:
> >>>>
> >>>> In fact I am very much agree with you. Right now, running the
> >> "accelerator"
> >>>> component in Bigtop disto gives one a pretty much complete fabric
> >> anyway.
> >>>> But
> >>>> in order to make just an accelerator component we perform quite a bit
> of
> >>>> woodoo magic during the packaging stage of the Bigtop build, shuffling
> >> jars
> >>>> from here and there. And that's quite crazy, honestly ;)
> >>>>
> >>>> Cos
> >>>>
> >>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> >>>>
> >>>> I tend to agree with Denis. I see only these differences between
> Hadoop
> >>>> Accelerator and Fabric builds (correct me if I miss something):
> >>>>
> >>>> - Limited set of available modules and no optional modules in Hadoop
> >>>> Accelerator.
> >>>> - No ignite-hadoop module in Fabric.
> >>>> - Additional scripts, configs and instructions included in Hadoop
> >>>> Accelerator.
> >>>>
> >>>> And the list of included modules frankly looks very weird. Here are
> only
> >>>> some of the issues I noticed:
> >>>>
> >>>> - ignite-indexing and ignite-spark are mandatory. Even if we need them
> >>>> for Hadoop Acceleration (which I doubt), are they really required or
> >> can
> >>>> be
> >>>> optional?
> >>>> - We force to use ignite-log4j module without providing other logger
> >>>> options (e.g., SLF).
> >>>> - We don't include ignite-aws module. How to use Hadoop Accelerator
> >> with
> >>>> S3 discovery?
> >>>> - Etc.
> >>>>
> >>>> It seems to me that if we try to fix all this issue, there will be
> >>>> virtually no difference between Fabric and Hadoop Accelerator builds
> >> except
> >>>> couple of scripts and config files. If so, there is no reason to have
> >> two
> >>>> builds.
> >>>>
> >>>> -Val
> >>>>
> >>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <[email protected]>
> wrote:
> >>>>
> >>>> On the separate note, in the Bigtop, we start looking into changing
> the
> >>>>
> >>>> way we
> >>>>
> >>>> deliver Ignite and we'll likely to start offering the whole 'data
> >> fabric'
> >>>> experience instead of the mere "hadoop-acceleration”.
> >>>>
> >>>>
> >>>> And you still will be using hadoop-accelerator libs of Ignite, right?
> >>>>
> >>>> I’m thinking of if there is a need to keep releasing Hadoop
> Accelerator
> >> as
> >>>> a separate delivery.
> >>>> What if we start releasing the accelerator as a part of the standard
> >>>> fabric binary putting hadoop-accelerator libs under ‘optional’ folder?
> >>>>
> >>>> —
> >>>> Denis
> >>>>
> >>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <[email protected]>
> >> wrote:
> >>>>
> >>>> What Denis said: spark has been added to the Hadoop accelerator as a
> way
> >>>>
> >>>> to
> >>>>
> >>>> boost the performance of more than just MR compute of the Hadoop
> stack,
> >>>>
> >>>> IIRC.
> >>>>
> >>>> For what it worth, Spark is considered a part of Hadoop at large.
> >>>>
> >>>> On the separate note, in the Bigtop, we start looking into changing
> the
> >>>>
> >>>> way we
> >>>>
> >>>> deliver Ignite and we'll likely to start offering the whole 'data
> >> fabric'
> >>>> experience instead of the mere "hadoop-acceleration".
> >>>>
> >>>> Cos
> >>>>
> >>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> >>>>
> >>>> Val,
> >>>>
> >>>> Ignite Hadoop module includes not only the map-reduce accelerator but
> >>>>
> >>>> Ignite
> >>>>
> >>>> Hadoop File System component as well. The latter can be used in
> >>>>
> >>>> deployments
> >>>>
> >>>> like HDFS+IGFS+Ignite Spark + Spark.
> >>>>
> >>>> Considering this I’m for the second solution proposed by you: put both
> >>>>
> >>>> 2.10
> >>>>
> >>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
> >>>> Accelerator distribution.
> >>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> >>>>
> >>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> >>>>
> >>>>
> >>>> BTW, this task may be affected or related to the following ones:
> >>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> >>>>
> >>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> >>>>
> >>>> https://issues.apache.org/jira/browse/IGNITE-3822
> >>>>
> >>>> —
> >>>> Denis
> >>>>
> >>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> >>>>
> >>>> [email protected]> wrote:
> >>>>
> >>>>
> >>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
> >>>>
> >>>> Hadoop
> >>>>
> >>>> when running its jobs. ignite-spark module only provides IgniteRDD
> >>>>
> >>>> which
> >>>>
> >>>> Hadoop obviously will never use.
> >>>>
> >>>> Is there another use case for Hadoop Accelerator which I'm missing?
> >>>>
> >>>> -Val
> >>>>
> >>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> >>>>
> >>>> [email protected]>
> >>>>
> >>>> wrote:
> >>>>
> >>>> Why do you think that spark module is not needed in our hadoop build?
> >>>>
> >>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> >>>> [email protected]> wrote:
> >>>>
> >>>> Folks,
> >>>>
> >>>> Is there anyone who understands the purpose of including ignite-spark
> >>>> module in the Hadoop Accelerator build? I can't figure out a use
> >>>>
> >>>> case for
> >>>>
> >>>> which it's needed.
> >>>>
> >>>> In case we actually need it there, there is an issue then. We
> >>>>
> >>>> actually
> >>>>
> >>>> have
> >>>>
> >>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
> >>>>
> >>>> everything
> >>>>
> >>>> is
> >>>>
> >>>> good, we put both in 'optional' folder and user can enable either
> >>>>
> >>>> one.
> >>>>
> >>>> But
> >>>>
> >>>> in Hadoop Accelerator there is only 2.11 which means that the build
> >>>>
> >>>> doesn't
> >>>>
> >>>> work with 2.10 out of the box.
> >>>>
> >>>> We should either remove the module from the build, or fix the issue.
> >>>>
> >>>> -Val
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
>
>

Re: ignite-spark module in Hadoop Accelerator

Reply via email to