Re: [Architecture] Annotation scheme for Hive scripts

Malith Dhanushka Wed, 14 Aug 2013 00:46:37 -0700

Hi Maninda,

Of course, we can add them as set of built-in analyzers and it will improve
the scope of Hive scripts.


Thanks,
Malith


On Wed, Aug 14, 2013 at 11:55 AM, Maninda Edirisooriya <[email protected]>wrote:

> Hi Malith,
>
> Nice work. Using this new feature we can implement the requirements I
> mentioned in the Architecture mail titled "Making BAM more useful platform
> with well defined Class Analyzers / UDFs". WDYT?
>
> *
> Maninda Edirisooriya*
> Software Engineer
> *WSO2, Inc.
> *lean.enterprise.middleware.
>
> *Blog* : http://maninda.blogspot.com/
> *Phone* : +94 777603226
>
>
> On Tue, Aug 13, 2013 at 8:24 PM, Malith Dhanushka <[email protected]> wrote:
>
>> Hi all,
>>
>> I have modified the implementation according to above description and
>> following is the modified version,
>>
>> - analyzer-config.xml contains the mapping details.
>>
>> ex : -
>>
>> analyzer-config.xml
>>
>> *<analyzerConfig xmlns="http://wso2.org/carbon/analytics";>*
>> *    <analyzers>*
>> *        <analyzer>*
>> *            <name>foo</name>*
>> *
>> <class>org.wso2.carbon.analytics.hive.extension.builtin.FooAnalyzer</class>
>> *
>> *            <parameters>bar,bat1,*</parameters>*
>> *        </analyzer>*
>> *    </analyzers>*
>> *</analyzerConfig>*
>>
>> parameter description,
>>
>> *name *- alias name which maps to the class analyzer
>>
>> *class* - class analyzer
>>
>> *parameters* - parameters that are accepted by class analyzer
>>
>> - This can be utilized in hive script as follows,
>>
>> syntax
>>
>> *analyzer foo(bar="value",bar1="value1",*)*;
>>
>> Currently there is one built-in analyzer which is resolvePath analyzer
>> and more will be added by considering other common use cases.
>>
>> Thanks,
>> Malith
>>
>>
>> On Mon, Aug 5, 2013 at 11:36 AM, Anjana Fernando <[email protected]> wrote:
>>
>>> On Mon, Aug 5, 2013 at 11:30 AM, Maninda Edirisooriya 
>>> <[email protected]>wrote:
>>>
>>>> On Mon, Aug 5, 2013 at 7:58 AM, Malith Dhanushka <[email protected]>wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Implementation of the above suggested approach is in the final stage.
>>>>> But I had a minor clarification of the implementation with Anjana. There 
>>>>> we
>>>>> came across following drawbacks of the implemented approach,
>>>>>
>>>>> - Annotation should not be coupled with a class analyzer, rather it
>>>>> should be a run time property injector to hive scripts.
>>>>>
>>>> Then won't the annotation feature will be bounded to Hive language? If
>>>> so we will not be able to integrate more languages to annotations in 
>>>> future.
>>>>
>>>
>>> It won't be .. the current implementation happens to only support Hive
>>> at the moment, since that's what we have now. Not coupling with class
>>> analyzer itself is a sign of that, because a class analyzer is anyway a
>>> Hive functionality we have.
>>>
>>> Cheers,
>>> Anjana.
>>>
>>>
>>>>
>>>>> - Abstract Annotation class adds unnecessary complications for user to
>>>>> write a custom annotation.
>>>>>
>>>>> So by considering above information I am going to modify the current
>>>>> implementation by adhering to those factors. If there any other concerns
>>>>> and suggestions please feel free to add.
>>>>>
>>>>> Thanks,
>>>>> Malith
>>>>>
>>>>>
>>>>> On Wed, Jul 10, 2013 at 1:26 PM, Malith Dhanushka <[email protected]>wrote:
>>>>>
>>>>>> Hi Maninda,
>>>>>>
>>>>>> On Wed, Jul 10, 2013 at 11:37 AM, Maninda Edirisooriya <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> This is nice. Can we use these annotations to unify the languages,
>>>>>>> Hive and Siddhi? If we can use this annotation framework as a platform 
>>>>>>> to
>>>>>>> support different languages it will be very useful when it comes to
>>>>>>> integrating other Hadoop related languages like Mahout and Pig. That 
>>>>>>> means
>>>>>>> we can separate each language relates script using annotations. This 
>>>>>>> will
>>>>>>> solve the problem of unifying all the languages into a single language.
>>>>>>>
>>>>>>
>>>>>> Interesting suggestion and yes ,this is gettable via annotations. But
>>>>>> the only limitation is that the current script implementation is only
>>>>>> available for hive. So in order to achieve language unification via
>>>>>> annotations firstly we need to have a unified script implementation for
>>>>>> each underlying engine (ie- Siddhi, Mahout, pig).
>>>>>>
>>>>>>>
>>>>>>> And also using this annotation framework we can create a generic
>>>>>>> Process Flow model on the data. For example we can execute several Hive
>>>>>>> scripts in parallel using a annotation block. And a barrier can be
>>>>>>> introduced if all the parallel scripts should be finished before we move
>>>>>>> onto the next script and so on.
>>>>>>>
>>>>>>
>>>>>> Yes, this can be added as a built in annotation.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Other than that we can provide a default set of class analysers as
>>>>>>> we have discussed in a previous mail. The value of annotations is that 
>>>>>>> we
>>>>>>> can provide the available set of class analysers out of the box. Any 
>>>>>>> idea
>>>>>>> about the syntax?
>>>>>>>
>>>>>>
>>>>>> Each annotation is associated with a particular class analyzer, which
>>>>>> process the given parameters via the annotation. So we can wrap that
>>>>>> default set of class analyzers and expose them as set of built in
>>>>>> annotations and can stick to the same syntax as previous,
>>>>>>
>>>>>> @script.foo(bar="value", bar1="value1",*)
>>>>>>
>>>>>>
>>>>>>> *
>>>>>>> Maninda Edirisooriya*
>>>>>>> Software Engineer
>>>>>>> *WSO2, Inc.
>>>>>>> *lean.enterprise.middleware.
>>>>>>>
>>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>>> *Phone* : +94 777603226
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 9, 2013 at 11:23 AM, Malith Dhanushka 
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>>>  Hi all,
>>>>>>>>
>>>>>>>> I have started implementing the $Subject. The idea of having an
>>>>>>>> annotation facility is to carryout some pre-processing of Hive queries
>>>>>>>> before they are being passed to the Hive engine. Currently we already 
>>>>>>>> have
>>>>>>>> a "class analyzer" which can be used execute some custom logic as a 
>>>>>>>> part of
>>>>>>>> a Hive script. But the main use case of annotations is to inject 
>>>>>>>> run-time
>>>>>>>> properties to Hive execution context before the actual queries are 
>>>>>>>> carried
>>>>>>>> out by Hive. The annotation facility would be building upon this by 
>>>>>>>> having
>>>>>>>> set of such common analyzers which can manipulate the Hive queries or 
>>>>>>>> Hive
>>>>>>>> execution context which it is passed to Hive query engine.
>>>>>>>>
>>>>>>>> Annotation Syntax,
>>>>>>>>
>>>>>>>> *@script.foo(bar="value", bar1="value1",*)*
>>>>>>>>
>>>>>>>>
>>>>>>>> Annotation scheme will be externalized by giving *abstract
>>>>>>>> implementation of annotation* and *annotation-config.xml* file to
>>>>>>>> provide the annotation configuration which allows third party 
>>>>>>>> annotations
>>>>>>>> to be included to the system.
>>>>>>>>
>>>>>>>> *annotation-config.xml*
>>>>>>>>
>>>>>>>> <annotation>
>>>>>>>> <name>foo</name>
>>>>>>>>
>>>>>>>> <class>org.wso2.carbon.analytics.hive.extension.annotation.foo</class>
>>>>>>>> <analyzer>org.wso2.carbon.analytics.hive.extension.foo</analyzer>
>>>>>>>> </annotation>
>>>>>>>>
>>>>>>>> <annotation>
>>>>>>>> ................................
>>>>>>>> </annotation>
>>>>>>>>
>>>>>>>>
>>>>>>>> Potential use case for this in incremental data processing where
>>>>>>>> any query associated with "*@script.incremental(foo="value1",
>>>>>>>> bar="value2",*)*" would flag and setup the properties those are
>>>>>>>> required to present in order for that particular query to be executed 
>>>>>>>> in an
>>>>>>>> incremental manner.There can be many other useful additions as well.
>>>>>>>>
>>>>>>>> Any suggestions, thoughts are welcome.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Malith Dhanushka
>>>>>>>>
>>>>>>>> Engineer - Data Technologies
>>>>>>>> *WSO2, Inc. : wso2.com*
>>>>>>>>
>>>>>>>> *Mobile*          : +94 716 506 693
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Architecture mailing list
>>>>>>>> [email protected]
>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> [email protected]
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Malith Dhanushka
>>>>>>
>>>>>> Engineer - Data Technologies
>>>>>> *WSO2, Inc. : wso2.com*
>>>>>>
>>>>>> *Mobile*          : +94 716 506 693
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Malith Dhanushka
>>>>>
>>>>> Engineer - Data Technologies
>>>>> *WSO2, Inc. : wso2.com*
>>>>>
>>>>> *Mobile*          : +94 716 506 693
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> *Anjana Fernando*
>>> Technical Lead
>>> WSO2 Inc. | http://wso2.com
>>> lean . enterprise . middleware
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Malith Dhanushka
>>
>> Engineer - Data Technologies
>> *WSO2, Inc. : wso2.com*
>>
>> *Mobile*          : +94 716 506 693
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Malith Dhanushka

Engineer - Data Technologies
*WSO2, Inc. : wso2.com*

*Mobile*          : +94 716 506 693

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] Annotation scheme for Hive scripts

Reply via email to