Re: [Architecture] Annotation scheme for Hive scripts

Maninda Edirisooriya Tue, 13 Aug 2013 23:27:37 -0700

Hi Malith,

Nice work. Using this new feature we can implement the requirements I
mentioned in the Architecture mail titled "Making BAM more useful platform
with well defined Class Analyzers / UDFs". WDYT?


*
Maninda Edirisooriya*
Software Engineer
*WSO2, Inc.
*lean.enterprise.middleware.

*Blog* : http://maninda.blogspot.com/
*Phone* : +94 777603226


On Tue, Aug 13, 2013 at 8:24 PM, Malith Dhanushka <[email protected]> wrote:

> Hi all,
>
> I have modified the implementation according to above description and
> following is the modified version,
>
> - analyzer-config.xml contains the mapping details.
>
> ex : -
>
> analyzer-config.xml
>
> *<analyzerConfig xmlns="http://wso2.org/carbon/analytics";>*
> *    <analyzers>*
> *        <analyzer>*
> *            <name>foo</name>*
> *
> <class>org.wso2.carbon.analytics.hive.extension.builtin.FooAnalyzer</class>
> *
> *            <parameters>bar,bat1,*</parameters>*
> *        </analyzer>*
> *    </analyzers>*
> *</analyzerConfig>*
>
> parameter description,
>
> *name *- alias name which maps to the class analyzer
>
> *class* - class analyzer
>
> *parameters* - parameters that are accepted by class analyzer
>
> - This can be utilized in hive script as follows,
>
> syntax
>
> *analyzer foo(bar="value",bar1="value1",*)*;
>
> Currently there is one built-in analyzer which is resolvePath analyzer and
> more will be added by considering other common use cases.
>
> Thanks,
> Malith
>
>
> On Mon, Aug 5, 2013 at 11:36 AM, Anjana Fernando <[email protected]> wrote:
>
>> On Mon, Aug 5, 2013 at 11:30 AM, Maninda Edirisooriya 
>> <[email protected]>wrote:
>>
>>> On Mon, Aug 5, 2013 at 7:58 AM, Malith Dhanushka <[email protected]>wrote:
>>>
>>>> Hi all,
>>>>
>>>> Implementation of the above suggested approach is in the final stage.
>>>> But I had a minor clarification of the implementation with Anjana. There we
>>>> came across following drawbacks of the implemented approach,
>>>>
>>>> - Annotation should not be coupled with a class analyzer, rather it
>>>> should be a run time property injector to hive scripts.
>>>>
>>> Then won't the annotation feature will be bounded to Hive language? If
>>> so we will not be able to integrate more languages to annotations in future.
>>>
>>
>> It won't be .. the current implementation happens to only support Hive at
>> the moment, since that's what we have now. Not coupling with class analyzer
>> itself is a sign of that, because a class analyzer is anyway a Hive
>> functionality we have.
>>
>> Cheers,
>> Anjana.
>>
>>
>>>
>>>> - Abstract Annotation class adds unnecessary complications for user to
>>>> write a custom annotation.
>>>>
>>>> So by considering above information I am going to modify the current
>>>> implementation by adhering to those factors. If there any other concerns
>>>> and suggestions please feel free to add.
>>>>
>>>> Thanks,
>>>> Malith
>>>>
>>>>
>>>> On Wed, Jul 10, 2013 at 1:26 PM, Malith Dhanushka <[email protected]>wrote:
>>>>
>>>>> Hi Maninda,
>>>>>
>>>>> On Wed, Jul 10, 2013 at 11:37 AM, Maninda Edirisooriya <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> This is nice. Can we use these annotations to unify the languages,
>>>>>> Hive and Siddhi? If we can use this annotation framework as a platform to
>>>>>> support different languages it will be very useful when it comes to
>>>>>> integrating other Hadoop related languages like Mahout and Pig. That 
>>>>>> means
>>>>>> we can separate each language relates script using annotations. This will
>>>>>> solve the problem of unifying all the languages into a single language.
>>>>>>
>>>>>
>>>>> Interesting suggestion and yes ,this is gettable via annotations. But
>>>>> the only limitation is that the current script implementation is only
>>>>> available for hive. So in order to achieve language unification via
>>>>> annotations firstly we need to have a unified script implementation for
>>>>> each underlying engine (ie- Siddhi, Mahout, pig).
>>>>>
>>>>>>
>>>>>> And also using this annotation framework we can create a generic
>>>>>> Process Flow model on the data. For example we can execute several Hive
>>>>>> scripts in parallel using a annotation block. And a barrier can be
>>>>>> introduced if all the parallel scripts should be finished before we move
>>>>>> onto the next script and so on.
>>>>>>
>>>>>
>>>>> Yes, this can be added as a built in annotation.
>>>>>
>>>>>
>>>>>>
>>>>>> Other than that we can provide a default set of class analysers as we
>>>>>> have discussed in a previous mail. The value of annotations is that we 
>>>>>> can
>>>>>> provide the available set of class analysers out of the box. Any idea 
>>>>>> about
>>>>>> the syntax?
>>>>>>
>>>>>
>>>>> Each annotation is associated with a particular class analyzer, which
>>>>> process the given parameters via the annotation. So we can wrap that
>>>>> default set of class analyzers and expose them as set of built in
>>>>> annotations and can stick to the same syntax as previous,
>>>>>
>>>>> @script.foo(bar="value", bar1="value1",*)
>>>>>
>>>>>
>>>>>> *
>>>>>> Maninda Edirisooriya*
>>>>>> Software Engineer
>>>>>> *WSO2, Inc.
>>>>>> *lean.enterprise.middleware.
>>>>>>
>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>> *Phone* : +94 777603226
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 9, 2013 at 11:23 AM, Malith Dhanushka <[email protected]>wrote:
>>>>>>
>>>>>>>  Hi all,
>>>>>>>
>>>>>>> I have started implementing the $Subject. The idea of having an
>>>>>>> annotation facility is to carryout some pre-processing of Hive queries
>>>>>>> before they are being passed to the Hive engine. Currently we already 
>>>>>>> have
>>>>>>> a "class analyzer" which can be used execute some custom logic as a 
>>>>>>> part of
>>>>>>> a Hive script. But the main use case of annotations is to inject 
>>>>>>> run-time
>>>>>>> properties to Hive execution context before the actual queries are 
>>>>>>> carried
>>>>>>> out by Hive. The annotation facility would be building upon this by 
>>>>>>> having
>>>>>>> set of such common analyzers which can manipulate the Hive queries or 
>>>>>>> Hive
>>>>>>> execution context which it is passed to Hive query engine.
>>>>>>>
>>>>>>> Annotation Syntax,
>>>>>>>
>>>>>>> *@script.foo(bar="value", bar1="value1",*)*
>>>>>>>
>>>>>>>
>>>>>>> Annotation scheme will be externalized by giving *abstract
>>>>>>> implementation of annotation* and *annotation-config.xml* file to
>>>>>>> provide the annotation configuration which allows third party 
>>>>>>> annotations
>>>>>>> to be included to the system.
>>>>>>>
>>>>>>> *annotation-config.xml*
>>>>>>>
>>>>>>> <annotation>
>>>>>>> <name>foo</name>
>>>>>>>
>>>>>>> <class>org.wso2.carbon.analytics.hive.extension.annotation.foo</class>
>>>>>>> <analyzer>org.wso2.carbon.analytics.hive.extension.foo</analyzer>
>>>>>>> </annotation>
>>>>>>>
>>>>>>> <annotation>
>>>>>>> ................................
>>>>>>> </annotation>
>>>>>>>
>>>>>>>
>>>>>>> Potential use case for this in incremental data processing where any
>>>>>>> query associated with "*@script.incremental(foo="value1",
>>>>>>> bar="value2",*)*" would flag and setup the properties those are
>>>>>>> required to present in order for that particular query to be executed 
>>>>>>> in an
>>>>>>> incremental manner.There can be many other useful additions as well.
>>>>>>>
>>>>>>> Any suggestions, thoughts are welcome.
>>>>>>>
>>>>>>> --
>>>>>>> Malith Dhanushka
>>>>>>>
>>>>>>> Engineer - Data Technologies
>>>>>>> *WSO2, Inc. : wso2.com*
>>>>>>>
>>>>>>> *Mobile*          : +94 716 506 693
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> [email protected]
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Architecture mailing list
>>>>>> [email protected]
>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Malith Dhanushka
>>>>>
>>>>> Engineer - Data Technologies
>>>>> *WSO2, Inc. : wso2.com*
>>>>>
>>>>> *Mobile*          : +94 716 506 693
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Malith Dhanushka
>>>>
>>>> Engineer - Data Technologies
>>>> *WSO2, Inc. : wso2.com*
>>>>
>>>> *Mobile*          : +94 716 506 693
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> *Anjana Fernando*
>> Technical Lead
>> WSO2 Inc. | http://wso2.com
>> lean . enterprise . middleware
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Malith Dhanushka
>
> Engineer - Data Technologies
> *WSO2, Inc. : wso2.com*
>
> *Mobile*          : +94 716 506 693
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] Annotation scheme for Hive scripts

Reply via email to