Re: ignite-spark module in Hadoop Accelerator

Denis Magda Wed, 30 Nov 2016 22:54:37 -0800

Vovan,

As one of hadoop maintainers, please share your point of view on this.


—
Denis

> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <[email protected]> wrote:
> 
> Denis
> 
> I agree that at the moment there's no reason to split into fabric and
> hadoop editions.
> 
> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <[email protected]> wrote:
> 
>> Hadoop Accelerator doesn’t require any additional libraries in compare to
>> those we have in the fabric build. It only lacks some of them as Val
>> mentioned below.
>> 
>> Wouldn’t it better to discontinue Hadoop Accelerator edition and simply
>> deliver hadoop jar and its configs as a part of the fabric?
>> 
>> —
>> Denis
>> 
>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <[email protected]>
>> wrote:
>>> 
>>> Separate edition for the Hadoop Accelerator was primarily driven by the
>>> default libraries. Hadoop Accelerator requires many more libraries as
>> well
>>> as configuration settings compared to the standard fabric download.
>>> 
>>> Now, as far as spark integration is concerned, I am not sure which
>> edition
>>> it belongs in, Hadoop Accelerator or standard fabric.
>>> 
>>> D.
>>> 
>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <[email protected]> wrote:
>>> 
>>>> *Dmitriy*,
>>>> 
>>>> I do believe that you should know why the community decided to a
>> separate
>>>> edition for the Hadoop Accelerator. What was the reason for that?
>>>> Presently, as I see, it brings more confusion and difficulties rather
>> then
>>>> benefit.
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <[email protected]> wrote:
>>>> 
>>>> In fact I am very much agree with you. Right now, running the
>> "accelerator"
>>>> component in Bigtop disto gives one a pretty much complete fabric
>> anyway.
>>>> But
>>>> in order to make just an accelerator component we perform quite a bit of
>>>> woodoo magic during the packaging stage of the Bigtop build, shuffling
>> jars
>>>> from here and there. And that's quite crazy, honestly ;)
>>>> 
>>>> Cos
>>>> 
>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
>>>> 
>>>> I tend to agree with Denis. I see only these differences between Hadoop
>>>> Accelerator and Fabric builds (correct me if I miss something):
>>>> 
>>>> - Limited set of available modules and no optional modules in Hadoop
>>>> Accelerator.
>>>> - No ignite-hadoop module in Fabric.
>>>> - Additional scripts, configs and instructions included in Hadoop
>>>> Accelerator.
>>>> 
>>>> And the list of included modules frankly looks very weird. Here are only
>>>> some of the issues I noticed:
>>>> 
>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need them
>>>> for Hadoop Acceleration (which I doubt), are they really required or
>> can
>>>> be
>>>> optional?
>>>> - We force to use ignite-log4j module without providing other logger
>>>> options (e.g., SLF).
>>>> - We don't include ignite-aws module. How to use Hadoop Accelerator
>> with
>>>> S3 discovery?
>>>> - Etc.
>>>> 
>>>> It seems to me that if we try to fix all this issue, there will be
>>>> virtually no difference between Fabric and Hadoop Accelerator builds
>> except
>>>> couple of scripts and config files. If so, there is no reason to have
>> two
>>>> builds.
>>>> 
>>>> -Val
>>>> 
>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <[email protected]> wrote:
>>>> 
>>>> On the separate note, in the Bigtop, we start looking into changing the
>>>> 
>>>> way we
>>>> 
>>>> deliver Ignite and we'll likely to start offering the whole 'data
>> fabric'
>>>> experience instead of the mere "hadoop-acceleration”.
>>>> 
>>>> 
>>>> And you still will be using hadoop-accelerator libs of Ignite, right?
>>>> 
>>>> I’m thinking of if there is a need to keep releasing Hadoop Accelerator
>> as
>>>> a separate delivery.
>>>> What if we start releasing the accelerator as a part of the standard
>>>> fabric binary putting hadoop-accelerator libs under ‘optional’ folder?
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <[email protected]>
>> wrote:
>>>> 
>>>> What Denis said: spark has been added to the Hadoop accelerator as a way
>>>> 
>>>> to
>>>> 
>>>> boost the performance of more than just MR compute of the Hadoop stack,
>>>> 
>>>> IIRC.
>>>> 
>>>> For what it worth, Spark is considered a part of Hadoop at large.
>>>> 
>>>> On the separate note, in the Bigtop, we start looking into changing the
>>>> 
>>>> way we
>>>> 
>>>> deliver Ignite and we'll likely to start offering the whole 'data
>> fabric'
>>>> experience instead of the mere "hadoop-acceleration".
>>>> 
>>>> Cos
>>>> 
>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
>>>> 
>>>> Val,
>>>> 
>>>> Ignite Hadoop module includes not only the map-reduce accelerator but
>>>> 
>>>> Ignite
>>>> 
>>>> Hadoop File System component as well. The latter can be used in
>>>> 
>>>> deployments
>>>> 
>>>> like HDFS+IGFS+Ignite Spark + Spark.
>>>> 
>>>> Considering this I’m for the second solution proposed by you: put both
>>>> 
>>>> 2.10
>>>> 
>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
>>>> Accelerator distribution.
>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
>>>> 
>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
>>>> 
>>>> 
>>>> BTW, this task may be affected or related to the following ones:
>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
>>>> 
>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
>>>> 
>>>> https://issues.apache.org/jira/browse/IGNITE-3822
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
>>>> 
>>>> [email protected]> wrote:
>>>> 
>>>> 
>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
>>>> 
>>>> Hadoop
>>>> 
>>>> when running its jobs. ignite-spark module only provides IgniteRDD
>>>> 
>>>> which
>>>> 
>>>> Hadoop obviously will never use.
>>>> 
>>>> Is there another use case for Hadoop Accelerator which I'm missing?
>>>> 
>>>> -Val
>>>> 
>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
>>>> 
>>>> [email protected]>
>>>> 
>>>> wrote:
>>>> 
>>>> Why do you think that spark module is not needed in our hadoop build?
>>>> 
>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>>>> [email protected]> wrote:
>>>> 
>>>> Folks,
>>>> 
>>>> Is there anyone who understands the purpose of including ignite-spark
>>>> module in the Hadoop Accelerator build? I can't figure out a use
>>>> 
>>>> case for
>>>> 
>>>> which it's needed.
>>>> 
>>>> In case we actually need it there, there is an issue then. We
>>>> 
>>>> actually
>>>> 
>>>> have
>>>> 
>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
>>>> 
>>>> everything
>>>> 
>>>> is
>>>> 
>>>> good, we put both in 'optional' folder and user can enable either
>>>> 
>>>> one.
>>>> 
>>>> But
>>>> 
>>>> in Hadoop Accelerator there is only 2.11 which means that the build
>>>> 
>>>> doesn't
>>>> 
>>>> work with 2.10 out of the box.
>>>> 
>>>> We should either remove the module from the build, or fix the issue.
>>>> 
>>>> -Val
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
> 
> 
> -- 
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com

Re: ignite-spark module in Hadoop Accelerator

Reply via email to