Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

Buddhika Chamith Mon, 23 Jul 2012 06:58:26 -0700

Hi All,

It is not that it is impossible to inject runtime variables (bit like query
parameters in DSS) to Hive query execution it might take some modifications
from the hive side to make it possible in order to do it programatically.
Currently I am doing some work in Hive for making it tenant aware. What I
mentioned was that I can look to this as a part of that effort though it
might take couple of days time since I have to figure out a clean way to
expose tenant specific Hive Configuration to the carbon environment. Anyway
I was not aware of the thread on Hive user list and now going through that
I see that they have suggested an alternative way provided that we are ok
with modifying the original script.


Regards
Buddhika

On Mon, Jul 23, 2012 at 6:41 PM, Tharindu Mathew <[email protected]> wrote:

> If you are planning to do a few MB, that would mean that the size of logs
> will be ( size of logs * no. of tenants ), so roughly for 200 active
> tenants and 2 MB of logs, it would come to around 400 MB. This is still
> manageable in a custom task if your data processing is low.
>
> On Mon, Jul 23, 2012 at 6:24 PM, Afkham Azeez <[email protected]> wrote:
>
>> Like you said, the task may not be the best way to do this. Like we
>> discussed the other day, we can publish logs to unique column families
>> which contain the <Service>_<Tenant>_<Date> as the unique identifier. We
>> need to generate logs in a file format & allow tenant users to download
>> those. What is the best approach to generate these log files from the data
>> collected? Typically, such a log file can run into a few MB.
>
> I'm a bit confused as we did not need to use Hive as per our earlier
> conversation. This is because as the data is published it is already
> grouped by server/ tenant and date.
>
>>
>> Azeez
>>
>>
>> On Mon, Jul 23, 2012 at 6:18 PM, Tharindu Mathew <[email protected]>wrote:
>>
>>> I'm no expert, but I immediately question the scale of this approach.
>>>
>>> Do you have an idea of how much of logs you plan to process per task?
>>>
>>>
>>> On Mon, Jul 23, 2012 at 6:13 PM, Afkham Azeez <[email protected]> wrote:
>>>
>>>> The requirement is simple. We need to generate log files on a per
>>>> tenant, per date, per Service basis. Now as a big data & analytics expert,
>>>> please advise us on what is the best solution for this.
>>>>
>>>> Azeez
>>>>
>>>>
>>>> On Mon, Jul 23, 2012 at 6:05 PM, Tharindu Mathew <[email protected]>wrote:
>>>>
>>>>> So through this custom java task, what is the scale of log processing
>>>>> you will support? 100MB, 1 GB, 100 GB, 1 TB?
>>>>>
>>>>> On Mon, Jul 23, 2012 at 5:14 PM, Manisha Gayathri <[email protected]>wrote:
>>>>>
>>>>>> Contacted Hive User Group as well on this matter.
>>>>>> They also mentioned that this approach is not possible.
>>>>>> Also as per the chat I had with Buddhika, right now, these kind of
>>>>>> dynamic variable creations is not possible in Hive that comes with BAM2.
>>>>>>
>>>>>> Therefore IMO, without going ahead with this cumbersome process, the
>>>>>> best way will be to run a scheduled java task to pick data from relevant
>>>>>> Cassandra Column families and dynamically generate the relevant log files
>>>>>> (according to the tenantID and current date) which will be stored in 
>>>>>> Apache
>>>>>> Directory.
>>>>>>
>>>>> You are going to store the results in a LDAP?
>>>>>
>>>>>>
>>>>>> As per the offline chat had with Azeez, will start to work on a
>>>>>> custom Java task that can handle the above scenario.
>>>>>>
>>>>>> On Mon, Jul 23, 2012 at 2:27 PM, Manisha Gayathri 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> For a log file storing scenario using BAM2, I have a requirement to
>>>>>>> generate separate log files for each date. For that I have created a 
>>>>>>> Hive
>>>>>>> Analytic query along with a Hive UDF as well.
>>>>>>>
>>>>>>> I have the getFilePath function which should return a URL like this.
>>>>>>>
>>>>>>> home/user/Desktop/logDir/logs/log_0_testServer_2012_07_22
>>>>>>>
>>>>>>> The defined function works perfectly if I put *getFilePath(
>>>>>>> "0","testServer" ) *into the *select* statement.
>>>>>>>
>>>>>>> But I want to get that particular URL as the *local directory name*.
>>>>>>> (The requirement is such that this should not be hard-coded in the hive
>>>>>>> query. Rather should be generated in the custom UDF. )
>>>>>>>
>>>>>>> So can I do something like I v shown below?
>>>>>>>
>>>>>>> *set file_name= getFilePath( "0","testServer" );    *//Define a
>>>>>>> parameter.* *
>>>>>>> *.................*
>>>>>>> *..............*
>>>>>>> *INSERT OVERWRITE LOCAL DIRECTORY 'file:///${hiveconf:file_name}'
>>>>>>>                  *//Assign the above parameter as the file URL
>>>>>>>
>>>>>>> I tried this way. But the directory name is returned as
>>>>>>>
>>>>>>> file:/getFilePath( "0" , "testServer" )
>>>>>>>
>>>>>>> Does that mean I cannot use UDF to define the local directory name?
>>>>>>> Or am I doing anything wrong in here?
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ~Regards
>>>>>>> *Manisha Eleperuma*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.: http://wso2.com
>>>>>>> lean.enterprise.middleware
>>>>>>>
>>>>>>> *
>>>>>>> *
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ~Regards
>>>>>> *Manisha Eleperuma*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.: http://wso2.com
>>>>>> lean.enterprise.middleware
>>>>>>
>>>>>> *
>>>>>> *
>>>>>> *
>>>>>> *
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list
>>>>>> [email protected]
>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>>
>>>>> Tharindu
>>>>>
>>>>> blog: http://mackiemathew.com/
>>>>> M: +94777759908
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list
>>>>> [email protected]
>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Afkham Azeez*
>>>> Director of Architecture; WSO2, Inc.; http://wso2.com
>>>> Member; Apache Software Foundation; http://www.apache.org/
>>>> * <http://www.apache.org/>**
>>>> email: **[email protected]* <[email protected]>* cell: +94 77 3320919
>>>> blog: **http://blog.afkham.org* <http://blog.afkham.org>*
>>>> twitter: 
>>>> **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
>>>> *
>>>> linked-in: **http://lk.linkedin.com/in/afkhamazeez*
>>>> *
>>>> *
>>>> *Lean . Enterprise . Middleware*
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Tharindu
>>>
>>> blog: http://mackiemathew.com/
>>> M: +94777759908
>>>
>>>
>>
>>
>> --
>> *Afkham Azeez*
>> Director of Architecture; WSO2, Inc.; http://wso2.com
>> Member; Apache Software Foundation; http://www.apache.org/
>> * <http://www.apache.org/>**
>> email: **[email protected]* <[email protected]>* cell: +94 77 3320919
>> blog: **http://blog.afkham.org* <http://blog.afkham.org>*
>> twitter: **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
>> *
>> linked-in: **http://lk.linkedin.com/in/afkhamazeez*
>> *
>> *
>> *Lean . Enterprise . Middleware*
>>
>>
>
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
> M: +94777759908
>
>
> _______________________________________________
> Dev mailing list
> [email protected]
> http://wso2.org/cgi-bin/mailman/listinfo/dev
>
>

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

Reply via email to