Another point is that hadoop edition has no optional modules. It forces
user to download the fabric edition and copy module from there.

On Thu, Dec 8, 2016 at 12:19 PM, Vladimir Ozerov <voze...@gridgain.com>
wrote:

> Work for ourselves - is to maintain two separate editions, while everything
> can be easily merged into a single distribution.
>
> On Wed, Dec 7, 2016 at 3:29 AM, Dmitriy Setrakyan <dsetrak...@apache.org>
> wrote:
>
> > Why are we creating work for ourselves? What is wrong with having 2
> > downloads?
> >
> > Hadoop accelerator edition exists for the following 2 purposes only:
> >
> >    - accelerate HDFS with Ignite In-Memory File System (IGFS)
> >    - accelerate Hadoop MapReduce with Ignite In-Memory MapReduce
> >
> > I agree with the original email from Valentin that Spark libs should not
> be
> > included into hadoop-accelerator download. Spark integration is not part
> of
> > Ignite Hadoop Accelerator and should be included only into the Ignite
> > fabric download.
> >
> > D.
> >
> >
> >
> > On Tue, Dec 6, 2016 at 12:30 AM, Sergey Kozlov <skoz...@gridgain.com>
> > wrote:
> >
> > > Hi
> > >
> > > In general I agree with Vladimir but would suggest more technical
> > details:
> > >
> > > Due the need to collect particular CLASS_PATHs for fabric and hadoop
> > > editions we can change the logic of processing of libs directory
> > >
> > > 1. Introduce libs/hadoop and libs/fabric directories. These directories
> > are
> > > root directories for specific modules for hadoop and fabric
> > > editions respectively
> > > 2. Change collecting of directories for CLASS_PATH for ignite.sh:
> > >  - collect everything for libs except libs/hadoop
> > >  - collect everything from libs/fabric
> > > 3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may make
> > initial
> > > setup instead of setup-hadoop.sh) that constructs CLASS_PATH by
> following
> > > way:
> > >  - collect everything for libs except libs/fabirc
> > >  - collect everything from libs/hadoop
> > >
> > > This approach allows us following:
> > >  - share common modules across both editions (just put in libs)
> > >  - do not share edition-specific modules (either put in libs/hadoop or
> in
> > > libs/fabric)
> > >
> > >
> > >
> > >
> > > On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <voze...@gridgain.com
> >
> > > wrote:
> > >
> > > > Agree. I do not see any reasons to have two different products.
> > Instead,
> > > > just add ignite-hadoop.jar to distribution, and add separate script
> to
> > > > start Accelerator. We can go the same way as we did for "platforms":
> > > create
> > > > separate top-level folder "hadoop" in Fabric distribution and put all
> > > > realted Hadoop Acceleratro stuff there.
> > > >
> > > > On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
> > > > valentin.kuliche...@gmail.com> wrote:
> > > >
> > > > > In general, I don't quite understand why we should move any
> component
> > > > > outside of Fabric. The concept of Fabric is to have everything, no?
> > :)
> > > In
> > > > > other words, if a cluster was once setup for Hadoop Acceleration,
> why
> > > not
> > > > > allow to create a cache and/or run a task using native Ignite APIs
> > > > sometime
> > > > > later. We follow this approach with all our components and modules,
> > but
> > > > not
> > > > > with ignite-hadoop for some reason.
> > > > >
> > > > > If we get rid of Hadoop Accelerator build, initial setup of Hadoop
> > > > > integration can potentially become a bit more complicated, but with
> > > > proper
> > > > > documentation I don't think this is going to be a problem, because
> it
> > > > > requires multiple steps now anyway. And frankly the same can be
> said
> > > > about
> > > > > any optional module we have - enabling it requires some additional
> > > steps
> > > > as
> > > > > it doesn't work out of the box.
> > > > >
> > > > > -Val
> > > > >
> > > > > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dma...@apache.org>
> > > wrote:
> > > > >
> > > > >> Dmitriy,
> > > > >>
> > > > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> > > simply
> > > > >> >   becomes many dependencies don't make sense for hadoop
> > environment
> > > > >>
> > > > >> This reason why the discussion moved to this direction is exactly
> in
> > > > that.
> > > > >>
> > > > >> How do we decide what should be a part of Hadoop Accelerator and
> > what
> > > > >> should be excluded? If you read through Val and Cos comments below
> > > > you’ll
> > > > >> get more insights.
> > > > >>
> > > > >> In general, we need to have a clear understanding on what's Hadoop
> > > > >> Accelerator distribution use case. This will help us to come up
> > with a
> > > > >> final decision.
> > > > >>
> > > > >> If the accelerator is supposed to be plugged-in into an existed
> > Hadoop
> > > > >> environment by enabling MapReduce and/IGFS at the configuration
> > level
> > > > then
> > > > >> we should simply remove ignite-indexing, ignite-spark modules and
> > add
> > > > >> additional logging libs as well as AWS, GCE integrations’
> packages.
> > > > >>
> > > > >> But, wait, what if a user wants to leverage from Ignite Spark
> > > > >> Integration, Ignite SQL or Geospatial queries, Ignite streaming
> > > > >> capabilities after he has already plugged-in the accelerator. What
> > if
> > > > he is
> > > > >> ready to modify his existed code. He can’t simply switch to the
> > fabric
> > > > on
> > > > >> an application side because the fabric doesn’t include
> accelerator’s
> > > > libs
> > > > >> that are still needed. He can’t solely rely on the accelerator
> > > > distribution
> > > > >> as well which misses some libs. And, obviously, the user starts
> > > > shuffling
> > > > >> libs in between the fabric and accelerator to get what is
> required.
> > > > >>
> > > > >> Vladimir, can you share your thoughts on this?
> > > > >>
> > > > >> —
> > > > >> Denis
> > > > >>
> > > > >>
> > > > >>
> > > > >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <
> > > > dsetrak...@apache.org>
> > > > >> wrote:
> > > > >> >
> > > > >> > Guys,
> > > > >> >
> > > > >> > I just downloaded the hadoop accelerator and here are the
> > > differences
> > > > >> from
> > > > >> > the fabric edition that jump at me right away:
> > > > >> >
> > > > >> >   - the "bin/" folder has "setup-hadoop" scripts
> > > > >> >   - the "config/" folder has "hadoop" subfolder with necessary
> > > > >> >   hadoop-related configuration
> > > > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> > > simply
> > > > >> >   becomes many dependencies don't make sense for hadoop
> > environment
> > > > >> >
> > > > >> > I currently don't see how we can merge the hadoop accelerator
> with
> > > > >> standard
> > > > >> > fabric edition.
> > > > >> >
> > > > >> > D.
> > > > >> >
> > > > >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dma...@apache.org>
> > > > wrote:
> > > > >> >
> > > > >> >> Vovan,
> > > > >> >>
> > > > >> >> As one of hadoop maintainers, please share your point of view
> on
> > > > this.
> > > > >> >>
> > > > >> >> —
> > > > >> >> Denis
> > > > >> >>
> > > > >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <
> > skoz...@gridgain.com
> > > >
> > > > >> >> wrote:
> > > > >> >>>
> > > > >> >>> Denis
> > > > >> >>>
> > > > >> >>> I agree that at the moment there's no reason to split into
> > fabric
> > > > and
> > > > >> >>> hadoop editions.
> > > > >> >>>
> > > > >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <
> dma...@apache.org>
> > > > >> wrote:
> > > > >> >>>
> > > > >> >>>> Hadoop Accelerator doesn’t require any additional libraries
> in
> > > > >> compare
> > > > >> >> to
> > > > >> >>>> those we have in the fabric build. It only lacks some of them
> > as
> > > > Val
> > > > >> >>>> mentioned below.
> > > > >> >>>>
> > > > >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition
> > and
> > > > >> simply
> > > > >> >>>> deliver hadoop jar and its configs as a part of the fabric?
> > > > >> >>>>
> > > > >> >>>> —
> > > > >> >>>> Denis
> > > > >> >>>>
> > > > >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
> > > > >> dsetrak...@apache.org>
> > > > >> >>>> wrote:
> > > > >> >>>>>
> > > > >> >>>>> Separate edition for the Hadoop Accelerator was primarily
> > driven
> > > > by
> > > > >> the
> > > > >> >>>>> default libraries. Hadoop Accelerator requires many more
> > > libraries
> > > > >> as
> > > > >> >>>> well
> > > > >> >>>>> as configuration settings compared to the standard fabric
> > > > download.
> > > > >> >>>>>
> > > > >> >>>>> Now, as far as spark integration is concerned, I am not sure
> > > which
> > > > >> >>>> edition
> > > > >> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
> > > > >> >>>>>
> > > > >> >>>>> D.
> > > > >> >>>>>
> > > > >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <
> > dma...@apache.org
> > > >
> > > > >> >> wrote:
> > > > >> >>>>>
> > > > >> >>>>>> *Dmitriy*,
> > > > >> >>>>>>
> > > > >> >>>>>> I do believe that you should know why the community decided
> > to
> > > a
> > > > >> >>>> separate
> > > > >> >>>>>> edition for the Hadoop Accelerator. What was the reason for
> > > that?
> > > > >> >>>>>> Presently, as I see, it brings more confusion and
> > difficulties
> > > > >> rather
> > > > >> >>>> then
> > > > >> >>>>>> benefit.
> > > > >> >>>>>>
> > > > >> >>>>>> —
> > > > >> >>>>>> Denis
> > > > >> >>>>>>
> > > > >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <
> > > c...@apache.org>
> > > > >> >> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> In fact I am very much agree with you. Right now, running
> the
> > > > >> >>>> "accelerator"
> > > > >> >>>>>> component in Bigtop disto gives one a pretty much complete
> > > fabric
> > > > >> >>>> anyway.
> > > > >> >>>>>> But
> > > > >> >>>>>> in order to make just an accelerator component we perform
> > > quite a
> > > > >> bit
> > > > >> >> of
> > > > >> >>>>>> woodoo magic during the packaging stage of the Bigtop
> build,
> > > > >> shuffling
> > > > >> >>>> jars
> > > > >> >>>>>> from here and there. And that's quite crazy, honestly ;)
> > > > >> >>>>>>
> > > > >> >>>>>> Cos
> > > > >> >>>>>>
> > > > >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> I tend to agree with Denis. I see only these differences
> > > between
> > > > >> >> Hadoop
> > > > >> >>>>>> Accelerator and Fabric builds (correct me if I miss
> > something):
> > > > >> >>>>>>
> > > > >> >>>>>> - Limited set of available modules and no optional modules
> in
> > > > >> Hadoop
> > > > >> >>>>>> Accelerator.
> > > > >> >>>>>> - No ignite-hadoop module in Fabric.
> > > > >> >>>>>> - Additional scripts, configs and instructions included in
> > > Hadoop
> > > > >> >>>>>> Accelerator.
> > > > >> >>>>>>
> > > > >> >>>>>> And the list of included modules frankly looks very weird.
> > Here
> > > > are
> > > > >> >> only
> > > > >> >>>>>> some of the issues I noticed:
> > > > >> >>>>>>
> > > > >> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if
> we
> > > need
> > > > >> them
> > > > >> >>>>>> for Hadoop Acceleration (which I doubt), are they really
> > > required
> > > > >> or
> > > > >> >>>> can
> > > > >> >>>>>> be
> > > > >> >>>>>> optional?
> > > > >> >>>>>> - We force to use ignite-log4j module without providing
> other
> > > > >> logger
> > > > >> >>>>>> options (e.g., SLF).
> > > > >> >>>>>> - We don't include ignite-aws module. How to use Hadoop
> > > > Accelerator
> > > > >> >>>> with
> > > > >> >>>>>> S3 discovery?
> > > > >> >>>>>> - Etc.
> > > > >> >>>>>>
> > > > >> >>>>>> It seems to me that if we try to fix all this issue, there
> > will
> > > > be
> > > > >> >>>>>> virtually no difference between Fabric and Hadoop
> Accelerator
> > > > >> builds
> > > > >> >>>> except
> > > > >> >>>>>> couple of scripts and config files. If so, there is no
> reason
> > > to
> > > > >> have
> > > > >> >>>> two
> > > > >> >>>>>> builds.
> > > > >> >>>>>>
> > > > >> >>>>>> -Val
> > > > >> >>>>>>
> > > > >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <
> > > dma...@apache.org>
> > > > >> >> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> On the separate note, in the Bigtop, we start looking into
> > > > changing
> > > > >> >> the
> > > > >> >>>>>>
> > > > >> >>>>>> way we
> > > > >> >>>>>>
> > > > >> >>>>>> deliver Ignite and we'll likely to start offering the whole
> > > 'data
> > > > >> >>>> fabric'
> > > > >> >>>>>> experience instead of the mere "hadoop-acceleration”.
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> And you still will be using hadoop-accelerator libs of
> > Ignite,
> > > > >> right?
> > > > >> >>>>>>
> > > > >> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop
> > > > >> >> Accelerator
> > > > >> >>>> as
> > > > >> >>>>>> a separate delivery.
> > > > >> >>>>>> What if we start releasing the accelerator as a part of the
> > > > >> standard
> > > > >> >>>>>> fabric binary putting hadoop-accelerator libs under
> > ‘optional’
> > > > >> folder?
> > > > >> >>>>>>
> > > > >> >>>>>> —
> > > > >> >>>>>> Denis
> > > > >> >>>>>>
> > > > >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <
> > > c...@apache.org
> > > > >
> > > > >> >>>> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> What Denis said: spark has been added to the Hadoop
> > accelerator
> > > > as
> > > > >> a
> > > > >> >> way
> > > > >> >>>>>>
> > > > >> >>>>>> to
> > > > >> >>>>>>
> > > > >> >>>>>> boost the performance of more than just MR compute of the
> > > Hadoop
> > > > >> >> stack,
> > > > >> >>>>>>
> > > > >> >>>>>> IIRC.
> > > > >> >>>>>>
> > > > >> >>>>>> For what it worth, Spark is considered a part of Hadoop at
> > > large.
> > > > >> >>>>>>
> > > > >> >>>>>> On the separate note, in the Bigtop, we start looking into
> > > > changing
> > > > >> >> the
> > > > >> >>>>>>
> > > > >> >>>>>> way we
> > > > >> >>>>>>
> > > > >> >>>>>> deliver Ignite and we'll likely to start offering the whole
> > > 'data
> > > > >> >>>> fabric'
> > > > >> >>>>>> experience instead of the mere "hadoop-acceleration".
> > > > >> >>>>>>
> > > > >> >>>>>> Cos
> > > > >> >>>>>>
> > > > >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> Val,
> > > > >> >>>>>>
> > > > >> >>>>>> Ignite Hadoop module includes not only the map-reduce
> > > accelerator
> > > > >> but
> > > > >> >>>>>>
> > > > >> >>>>>> Ignite
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop File System component as well. The latter can be
> used
> > in
> > > > >> >>>>>>
> > > > >> >>>>>> deployments
> > > > >> >>>>>>
> > > > >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> > > > >> >>>>>>
> > > > >> >>>>>> Considering this I’m for the second solution proposed by
> you:
> > > put
> > > > >> both
> > > > >> >>>>>>
> > > > >> >>>>>> 2.10
> > > > >> >>>>>>
> > > > >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of
> > Ignite
> > > > >> Hadoop
> > > > >> >>>>>> Accelerator distribution.
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> > > > >> >>>>>>
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> BTW, this task may be affected or related to the following
> > > ones:
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> > > > >> >>>>>>
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> > > > >> >>>>>>
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> > > > >> >>>>>>
> > > > >> >>>>>> —
> > > > >> >>>>>> Denis
> > > > >> >>>>>>
> > > > >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> > > > >> >>>>>>
> > > > >> >>>>>> valentin.kuliche...@gmail.com> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is
> > > used
> > > > by
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop
> > > > >> >>>>>>
> > > > >> >>>>>> when running its jobs. ignite-spark module only provides
> > > > IgniteRDD
> > > > >> >>>>>>
> > > > >> >>>>>> which
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop obviously will never use.
> > > > >> >>>>>>
> > > > >> >>>>>> Is there another use case for Hadoop Accelerator which I'm
> > > > missing?
> > > > >> >>>>>>
> > > > >> >>>>>> -Val
> > > > >> >>>>>>
> > > > >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> > > > >> >>>>>>
> > > > >> >>>>>> dsetrak...@apache.org>
> > > > >> >>>>>>
> > > > >> >>>>>> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> Why do you think that spark module is not needed in our
> > hadoop
> > > > >> build?
> > > > >> >>>>>>
> > > > >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> > > > >> >>>>>> valentin.kuliche...@gmail.com> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> Folks,
> > > > >> >>>>>>
> > > > >> >>>>>> Is there anyone who understands the purpose of including
> > > > >> ignite-spark
> > > > >> >>>>>> module in the Hadoop Accelerator build? I can't figure out
> a
> > > use
> > > > >> >>>>>>
> > > > >> >>>>>> case for
> > > > >> >>>>>>
> > > > >> >>>>>> which it's needed.
> > > > >> >>>>>>
> > > > >> >>>>>> In case we actually need it there, there is an issue then.
> We
> > > > >> >>>>>>
> > > > >> >>>>>> actually
> > > > >> >>>>>>
> > > > >> >>>>>> have
> > > > >> >>>>>>
> > > > >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric
> build
> > > > >> >>>>>>
> > > > >> >>>>>> everything
> > > > >> >>>>>>
> > > > >> >>>>>> is
> > > > >> >>>>>>
> > > > >> >>>>>> good, we put both in 'optional' folder and user can enable
> > > either
> > > > >> >>>>>>
> > > > >> >>>>>> one.
> > > > >> >>>>>>
> > > > >> >>>>>> But
> > > > >> >>>>>>
> > > > >> >>>>>> in Hadoop Accelerator there is only 2.11 which means that
> the
> > > > build
> > > > >> >>>>>>
> > > > >> >>>>>> doesn't
> > > > >> >>>>>>
> > > > >> >>>>>> work with 2.10 out of the box.
> > > > >> >>>>>>
> > > > >> >>>>>> We should either remove the module from the build, or fix
> the
> > > > >> issue.
> > > > >> >>>>>>
> > > > >> >>>>>> -Val
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>>
> > > > >> >>> --
> > > > >> >>> Sergey Kozlov
> > > > >> >>> GridGain Systems
> > > > >> >>> www.gridgain.com
> > > > >> >>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > >
> > > > --
> > > > Vladimir Ozerov
> > > > Senior Software Architect
> > > > GridGain Systems
> > > > www.gridgain.com
> > > > *+7 (960) 283 98 40*
> > > >
> > >
> > >
> > >
> > > --
> > > Sergey Kozlov
> > > GridGain Systems
> > > www.gridgain.com
> > >
> >
>
>
>
> --
> Vladimir Ozerov
> Senior Software Architect
> GridGain Systems
> www.gridgain.com
> *+7 (960) 283 98 40*
>



-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Reply via email to