Re: ignite-spark module in Hadoop Accelerator

Vladimir Ozerov Mon, 05 Dec 2016 12:58:32 -0800

Agree. I do not see any reasons to have two different products. Instead,
just add ignite-hadoop.jar to distribution, and add separate script to
start Accelerator. We can go the same way as we did for "platforms": create
separate top-level folder "hadoop" in Fabric distribution and put all
realted Hadoop Acceleratro stuff there.


On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
[email protected]> wrote:

> In general, I don't quite understand why we should move any component
> outside of Fabric. The concept of Fabric is to have everything, no? :) In
> other words, if a cluster was once setup for Hadoop Acceleration, why not
> allow to create a cache and/or run a task using native Ignite APIs sometime
> later. We follow this approach with all our components and modules, but not
> with ignite-hadoop for some reason.
>
> If we get rid of Hadoop Accelerator build, initial setup of Hadoop
> integration can potentially become a bit more complicated, but with proper
> documentation I don't think this is going to be a problem, because it
> requires multiple steps now anyway. And frankly the same can be said about
> any optional module we have - enabling it requires some additional steps as
> it doesn't work out of the box.
>
> -Val
>
> On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <[email protected]> wrote:
>
>> Dmitriy,
>>
>> >   - the "lib/" folder has much fewer libraries that in fabric, simply
>> >   becomes many dependencies don't make sense for hadoop environment
>>
>> This reason why the discussion moved to this direction is exactly in that.
>>
>> How do we decide what should be a part of Hadoop Accelerator and what
>> should be excluded? If you read through Val and Cos comments below you’ll
>> get more insights.
>>
>> In general, we need to have a clear understanding on what's Hadoop
>> Accelerator distribution use case. This will help us to come up with a
>> final decision.
>>
>> If the accelerator is supposed to be plugged-in into an existed Hadoop
>> environment by enabling MapReduce and/IGFS at the configuration level then
>> we should simply remove ignite-indexing, ignite-spark modules and add
>> additional logging libs as well as AWS, GCE integrations’ packages.
>>
>> But, wait, what if a user wants to leverage from Ignite Spark
>> Integration, Ignite SQL or Geospatial queries, Ignite streaming
>> capabilities after he has already plugged-in the accelerator. What if he is
>> ready to modify his existed code. He can’t simply switch to the fabric on
>> an application side because the fabric doesn’t include accelerator’s libs
>> that are still needed. He can’t solely rely on the accelerator distribution
>> as well which misses some libs. And, obviously, the user starts shuffling
>> libs in between the fabric and accelerator to get what is required.
>>
>> Vladimir, can you share your thoughts on this?
>>
>> —
>> Denis
>>
>>
>>
>> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <[email protected]>
>> wrote:
>> >
>> > Guys,
>> >
>> > I just downloaded the hadoop accelerator and here are the differences
>> from
>> > the fabric edition that jump at me right away:
>> >
>> >   - the "bin/" folder has "setup-hadoop" scripts
>> >   - the "config/" folder has "hadoop" subfolder with necessary
>> >   hadoop-related configuration
>> >   - the "lib/" folder has much fewer libraries that in fabric, simply
>> >   becomes many dependencies don't make sense for hadoop environment
>> >
>> > I currently don't see how we can merge the hadoop accelerator with
>> standard
>> > fabric edition.
>> >
>> > D.
>> >
>> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <[email protected]> wrote:
>> >
>> >> Vovan,
>> >>
>> >> As one of hadoop maintainers, please share your point of view on this.
>> >>
>> >> —
>> >> Denis
>> >>
>> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <[email protected]>
>> >> wrote:
>> >>>
>> >>> Denis
>> >>>
>> >>> I agree that at the moment there's no reason to split into fabric and
>> >>> hadoop editions.
>> >>>
>> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <[email protected]>
>> wrote:
>> >>>
>> >>>> Hadoop Accelerator doesn’t require any additional libraries in
>> compare
>> >> to
>> >>>> those we have in the fabric build. It only lacks some of them as Val
>> >>>> mentioned below.
>> >>>>
>> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition and
>> simply
>> >>>> deliver hadoop jar and its configs as a part of the fabric?
>> >>>>
>> >>>> —
>> >>>> Denis
>> >>>>
>> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
>> [email protected]>
>> >>>> wrote:
>> >>>>>
>> >>>>> Separate edition for the Hadoop Accelerator was primarily driven by
>> the
>> >>>>> default libraries. Hadoop Accelerator requires many more libraries
>> as
>> >>>> well
>> >>>>> as configuration settings compared to the standard fabric download.
>> >>>>>
>> >>>>> Now, as far as spark integration is concerned, I am not sure which
>> >>>> edition
>> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
>> >>>>>
>> >>>>> D.
>> >>>>>
>> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <[email protected]>
>> >> wrote:
>> >>>>>
>> >>>>>> *Dmitriy*,
>> >>>>>>
>> >>>>>> I do believe that you should know why the community decided to a
>> >>>> separate
>> >>>>>> edition for the Hadoop Accelerator. What was the reason for that?
>> >>>>>> Presently, as I see, it brings more confusion and difficulties
>> rather
>> >>>> then
>> >>>>>> benefit.
>> >>>>>>
>> >>>>>> —
>> >>>>>> Denis
>> >>>>>>
>> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <[email protected]>
>> >> wrote:
>> >>>>>>
>> >>>>>> In fact I am very much agree with you. Right now, running the
>> >>>> "accelerator"
>> >>>>>> component in Bigtop disto gives one a pretty much complete fabric
>> >>>> anyway.
>> >>>>>> But
>> >>>>>> in order to make just an accelerator component we perform quite a
>> bit
>> >> of
>> >>>>>> woodoo magic during the packaging stage of the Bigtop build,
>> shuffling
>> >>>> jars
>> >>>>>> from here and there. And that's quite crazy, honestly ;)
>> >>>>>>
>> >>>>>> Cos
>> >>>>>>
>> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
>> >>>>>>
>> >>>>>> I tend to agree with Denis. I see only these differences between
>> >> Hadoop
>> >>>>>> Accelerator and Fabric builds (correct me if I miss something):
>> >>>>>>
>> >>>>>> - Limited set of available modules and no optional modules in
>> Hadoop
>> >>>>>> Accelerator.
>> >>>>>> - No ignite-hadoop module in Fabric.
>> >>>>>> - Additional scripts, configs and instructions included in Hadoop
>> >>>>>> Accelerator.
>> >>>>>>
>> >>>>>> And the list of included modules frankly looks very weird. Here are
>> >> only
>> >>>>>> some of the issues I noticed:
>> >>>>>>
>> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need
>> them
>> >>>>>> for Hadoop Acceleration (which I doubt), are they really required
>> or
>> >>>> can
>> >>>>>> be
>> >>>>>> optional?
>> >>>>>> - We force to use ignite-log4j module without providing other
>> logger
>> >>>>>> options (e.g., SLF).
>> >>>>>> - We don't include ignite-aws module. How to use Hadoop Accelerator
>> >>>> with
>> >>>>>> S3 discovery?
>> >>>>>> - Etc.
>> >>>>>>
>> >>>>>> It seems to me that if we try to fix all this issue, there will be
>> >>>>>> virtually no difference between Fabric and Hadoop Accelerator
>> builds
>> >>>> except
>> >>>>>> couple of scripts and config files. If so, there is no reason to
>> have
>> >>>> two
>> >>>>>> builds.
>> >>>>>>
>> >>>>>> -Val
>> >>>>>>
>> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <[email protected]>
>> >> wrote:
>> >>>>>>
>> >>>>>> On the separate note, in the Bigtop, we start looking into changing
>> >> the
>> >>>>>>
>> >>>>>> way we
>> >>>>>>
>> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
>> >>>> fabric'
>> >>>>>> experience instead of the mere "hadoop-acceleration”.
>> >>>>>>
>> >>>>>>
>> >>>>>> And you still will be using hadoop-accelerator libs of Ignite,
>> right?
>> >>>>>>
>> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop
>> >> Accelerator
>> >>>> as
>> >>>>>> a separate delivery.
>> >>>>>> What if we start releasing the accelerator as a part of the
>> standard
>> >>>>>> fabric binary putting hadoop-accelerator libs under ‘optional’
>> folder?
>> >>>>>>
>> >>>>>> —
>> >>>>>> Denis
>> >>>>>>
>> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <[email protected]>
>> >>>> wrote:
>> >>>>>>
>> >>>>>> What Denis said: spark has been added to the Hadoop accelerator as
>> a
>> >> way
>> >>>>>>
>> >>>>>> to
>> >>>>>>
>> >>>>>> boost the performance of more than just MR compute of the Hadoop
>> >> stack,
>> >>>>>>
>> >>>>>> IIRC.
>> >>>>>>
>> >>>>>> For what it worth, Spark is considered a part of Hadoop at large.
>> >>>>>>
>> >>>>>> On the separate note, in the Bigtop, we start looking into changing
>> >> the
>> >>>>>>
>> >>>>>> way we
>> >>>>>>
>> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
>> >>>> fabric'
>> >>>>>> experience instead of the mere "hadoop-acceleration".
>> >>>>>>
>> >>>>>> Cos
>> >>>>>>
>> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
>> >>>>>>
>> >>>>>> Val,
>> >>>>>>
>> >>>>>> Ignite Hadoop module includes not only the map-reduce accelerator
>> but
>> >>>>>>
>> >>>>>> Ignite
>> >>>>>>
>> >>>>>> Hadoop File System component as well. The latter can be used in
>> >>>>>>
>> >>>>>> deployments
>> >>>>>>
>> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
>> >>>>>>
>> >>>>>> Considering this I’m for the second solution proposed by you: put
>> both
>> >>>>>>
>> >>>>>> 2.10
>> >>>>>>
>> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite
>> Hadoop
>> >>>>>> Accelerator distribution.
>> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
>> >>>>>>
>> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
>> >>>>>>
>> >>>>>>
>> >>>>>> BTW, this task may be affected or related to the following ones:
>> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
>> >>>>>>
>> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
>> >>>>>>
>> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
>> >>>>>>
>> >>>>>> —
>> >>>>>> Denis
>> >>>>>>
>> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
>> >>>>>>
>> >>>>>> [email protected]> wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
>> >>>>>>
>> >>>>>> Hadoop
>> >>>>>>
>> >>>>>> when running its jobs. ignite-spark module only provides IgniteRDD
>> >>>>>>
>> >>>>>> which
>> >>>>>>
>> >>>>>> Hadoop obviously will never use.
>> >>>>>>
>> >>>>>> Is there another use case for Hadoop Accelerator which I'm missing?
>> >>>>>>
>> >>>>>> -Val
>> >>>>>>
>> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
>> >>>>>>
>> >>>>>> [email protected]>
>> >>>>>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> Why do you think that spark module is not needed in our hadoop
>> build?
>> >>>>>>
>> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>> >>>>>> [email protected]> wrote:
>> >>>>>>
>> >>>>>> Folks,
>> >>>>>>
>> >>>>>> Is there anyone who understands the purpose of including
>> ignite-spark
>> >>>>>> module in the Hadoop Accelerator build? I can't figure out a use
>> >>>>>>
>> >>>>>> case for
>> >>>>>>
>> >>>>>> which it's needed.
>> >>>>>>
>> >>>>>> In case we actually need it there, there is an issue then. We
>> >>>>>>
>> >>>>>> actually
>> >>>>>>
>> >>>>>> have
>> >>>>>>
>> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
>> >>>>>>
>> >>>>>> everything
>> >>>>>>
>> >>>>>> is
>> >>>>>>
>> >>>>>> good, we put both in 'optional' folder and user can enable either
>> >>>>>>
>> >>>>>> one.
>> >>>>>>
>> >>>>>> But
>> >>>>>>
>> >>>>>> in Hadoop Accelerator there is only 2.11 which means that the build
>> >>>>>>
>> >>>>>> doesn't
>> >>>>>>
>> >>>>>> work with 2.10 out of the box.
>> >>>>>>
>> >>>>>> We should either remove the module from the build, or fix the
>> issue.
>> >>>>>>
>> >>>>>> -Val
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> Sergey Kozlov
>> >>> GridGain Systems
>> >>> www.gridgain.com
>> >>
>> >>
>>
>>
>


-- 
Vladimir Ozerov
Senior Software Architect
GridGain Systems
www.gridgain.com
*+7 (960) 283 98 40*

Re: ignite-spark module in Hadoop Accelerator

Reply via email to