Re: ignite-spark module in Hadoop Accelerator

Denis Magda Sat, 26 Nov 2016 19:41:20 -0800

Dmitriy,

I do believe that you should know why the community decided to a separate 
edition for the Hadoop Accelerator. What was the reason for that? Presently, as 
I see, it brings more confusion and difficulties rather then benefit.


—
Denis

> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <[email protected]> wrote:
> 
> In fact I am very much agree with you. Right now, running the "accelerator"
> component in Bigtop disto gives one a pretty much complete fabric anyway. But
> in order to make just an accelerator component we perform quite a bit of
> woodoo magic during the packaging stage of the Bigtop build, shuffling jars
> from here and there. And that's quite crazy, honestly ;)
> 
> Cos
> 
> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
>> I tend to agree with Denis. I see only these differences between Hadoop
>> Accelerator and Fabric builds (correct me if I miss something):
>> 
>>   - Limited set of available modules and no optional modules in Hadoop
>>   Accelerator.
>>   - No ignite-hadoop module in Fabric.
>>   - Additional scripts, configs and instructions included in Hadoop
>>   Accelerator.
>> 
>> And the list of included modules frankly looks very weird. Here are only
>> some of the issues I noticed:
>> 
>>   - ignite-indexing and ignite-spark are mandatory. Even if we need them
>>   for Hadoop Acceleration (which I doubt), are they really required or can be
>>   optional?
>>   - We force to use ignite-log4j module without providing other logger
>>   options (e.g., SLF).
>>   - We don't include ignite-aws module. How to use Hadoop Accelerator with
>>   S3 discovery?
>>   - Etc.
>> 
>> It seems to me that if we try to fix all this issue, there will be
>> virtually no difference between Fabric and Hadoop Accelerator builds except
>> couple of scripts and config files. If so, there is no reason to have two
>> builds.
>> 
>> -Val
>> 
>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <[email protected]> wrote:
>> 
>>>> On the separate note, in the Bigtop, we start looking into changing the
>>> way we
>>>> deliver Ignite and we'll likely to start offering the whole 'data fabric'
>>>> experience instead of the mere "hadoop-acceleration”.
>>> 
>>> And you still will be using hadoop-accelerator libs of Ignite, right?
>>> 
>>> I’m thinking of if there is a need to keep releasing Hadoop Accelerator as
>>> a separate delivery.
>>> What if we start releasing the accelerator as a part of the standard
>>> fabric binary putting hadoop-accelerator libs under ‘optional’ folder?
>>> 
>>> —
>>> Denis
>>> 
>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <[email protected]> wrote:
>>>> 
>>>> What Denis said: spark has been added to the Hadoop accelerator as a way
>>> to
>>>> boost the performance of more than just MR compute of the Hadoop stack,
>>> IIRC.
>>>> For what it worth, Spark is considered a part of Hadoop at large.
>>>> 
>>>> On the separate note, in the Bigtop, we start looking into changing the
>>> way we
>>>> deliver Ignite and we'll likely to start offering the whole 'data fabric'
>>>> experience instead of the mere "hadoop-acceleration".
>>>> 
>>>> Cos
>>>> 
>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
>>>>> Val,
>>>>> 
>>>>> Ignite Hadoop module includes not only the map-reduce accelerator but
>>> Ignite
>>>>> Hadoop File System component as well. The latter can be used in
>>> deployments
>>>>> like HDFS+IGFS+Ignite Spark + Spark.
>>>>> 
>>>>> Considering this I’m for the second solution proposed by you: put both
>>> 2.10
>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
>>>>> Accelerator distribution.
>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
>>> https://issues.apache.org/jira/browse/IGNITE-4254>
>>>>> 
>>>>> BTW, this task may be affected or related to the following ones:
>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
>>> https://issues.apache.org/jira/browse/IGNITE-3596>
>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
>>>>> 
>>>>> —
>>>>> Denis
>>>>> 
>>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
>>> [email protected]> wrote:
>>>>>> 
>>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
>>> Hadoop
>>>>>> when running its jobs. ignite-spark module only provides IgniteRDD
>>> which
>>>>>> Hadoop obviously will never use.
>>>>>> 
>>>>>> Is there another use case for Hadoop Accelerator which I'm missing?
>>>>>> 
>>>>>> -Val
>>>>>> 
>>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Why do you think that spark module is not needed in our hadoop build?
>>>>>>> 
>>>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>>> Folks,
>>>>>>>> 
>>>>>>>> Is there anyone who understands the purpose of including ignite-spark
>>>>>>>> module in the Hadoop Accelerator build? I can't figure out a use
>>> case for
>>>>>>>> which it's needed.
>>>>>>>> 
>>>>>>>> In case we actually need it there, there is an issue then. We
>>> actually
>>>>>>> have
>>>>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
>>> everything
>>>>>>> is
>>>>>>>> good, we put both in 'optional' folder and user can enable either
>>> one.
>>>>>>> But
>>>>>>>> in Hadoop Accelerator there is only 2.11 which means that the build
>>>>>>> doesn't
>>>>>>>> work with 2.10 out of the box.
>>>>>>>> 
>>>>>>>> We should either remove the module from the build, or fix the issue.
>>>>>>>> 
>>>>>>>> -Val
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>>>

Re: ignite-spark module in Hadoop Accelerator

Reply via email to