Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

Tharindu Mathew Mon, 23 Jul 2012 06:53:28 -0700

insert select * from foo

On Mon, Jul 23, 2012 at 7:15 PM, Afkham Azeez <[email protected]> wrote:


>
>
> On Mon, Jul 23, 2012 at 6:41 PM, Tharindu Mathew <[email protected]>wrote:
>
>> If you are planning to do a few MB, that would mean that the size of logs
>> will be ( size of logs * no. of tenants ), so roughly for 200 active
>> tenants and 2 MB of logs, it would come to around 400 MB. This is still
>> manageable in a custom task if your data processing is low.
>>
>> On Mon, Jul 23, 2012 at 6:24 PM, Afkham Azeez <[email protected]> wrote:
>>
>>> Like you said, the task may not be the best way to do this. Like we
>>> discussed the other day, we can publish logs to unique column families
>>> which contain the <Service>_<Tenant>_<Date> as the unique identifier. We
>>> need to generate logs in a file format & allow tenant users to download
>>> those. What is the best approach to generate these log files from the data
>>> collected? Typically, such a log file can run into a few MB.
>>
>> I'm a bit confused as we did not need to use Hive as per our earlier
>> conversation. This is because as the data is published it is already
>> grouped by server/ tenant and date.
>>
>
> Yeah, there is no analytics to be done. It is a problem of converting data
> stored in Cassandra into a flat file.
>
>
>>
>>> Azeez
>>>
>>>
>>> On Mon, Jul 23, 2012 at 6:18 PM, Tharindu Mathew <[email protected]>wrote:
>>>
>>>> I'm no expert, but I immediately question the scale of this approach.
>>>>
>>>> Do you have an idea of how much of logs you plan to process per task?
>>>>
>>>>
>>>> On Mon, Jul 23, 2012 at 6:13 PM, Afkham Azeez <[email protected]> wrote:
>>>>
>>>>> The requirement is simple. We need to generate log files on a per
>>>>> tenant, per date, per Service basis. Now as a big data & analytics expert,
>>>>> please advise us on what is the best solution for this.
>>>>>
>>>>> Azeez
>>>>>
>>>>>
>>>>> On Mon, Jul 23, 2012 at 6:05 PM, Tharindu Mathew <[email protected]>wrote:
>>>>>
>>>>>> So through this custom java task, what is the scale of log processing
>>>>>> you will support? 100MB, 1 GB, 100 GB, 1 TB?
>>>>>>
>>>>>> On Mon, Jul 23, 2012 at 5:14 PM, Manisha Gayathri 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> Contacted Hive User Group as well on this matter.
>>>>>>> They also mentioned that this approach is not possible.
>>>>>>> Also as per the chat I had with Buddhika, right now, these kind of
>>>>>>> dynamic variable creations is not possible in Hive that comes with BAM2.
>>>>>>>
>>>>>>> Therefore IMO, without going ahead with this cumbersome process, the
>>>>>>> best way will be to run a scheduled java task to pick data from relevant
>>>>>>> Cassandra Column families and dynamically generate the relevant log 
>>>>>>> files
>>>>>>> (according to the tenantID and current date) which will be stored in 
>>>>>>> Apache
>>>>>>> Directory.
>>>>>>>
>>>>>> You are going to store the results in a LDAP?
>>>>>>
>>>>>>>
>>>>>>> As per the offline chat had with Azeez, will start to work on a
>>>>>>> custom Java task that can handle the above scenario.
>>>>>>>
>>>>>>> On Mon, Jul 23, 2012 at 2:27 PM, Manisha Gayathri 
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> For a log file storing scenario using BAM2, I have a requirement to
>>>>>>>> generate separate log files for each date. For that I have created a 
>>>>>>>> Hive
>>>>>>>> Analytic query along with a Hive UDF as well.
>>>>>>>>
>>>>>>>> I have the getFilePath function which should return a URL like
>>>>>>>> this.
>>>>>>>>
>>>>>>>> home/user/Desktop/logDir/logs/log_0_testServer_2012_07_22
>>>>>>>>
>>>>>>>> The defined function works perfectly if I put *getFilePath(
>>>>>>>> "0","testServer" ) *into the *select* statement.
>>>>>>>>
>>>>>>>> But I want to get that particular URL as the *local directory name*.
>>>>>>>> (The requirement is such that this should not be hard-coded in the hive
>>>>>>>> query. Rather should be generated in the custom UDF. )
>>>>>>>>
>>>>>>>> So can I do something like I v shown below?
>>>>>>>>
>>>>>>>> *set file_name= getFilePath( "0","testServer" );    *//Define a
>>>>>>>> parameter.* *
>>>>>>>> *.................*
>>>>>>>> *..............*
>>>>>>>> *INSERT OVERWRITE LOCAL DIRECTORY 'file:///${hiveconf:file_name}'
>>>>>>>>                    *//Assign the above parameter as the file URL
>>>>>>>>
>>>>>>>> I tried this way. But the directory name is returned as
>>>>>>>>
>>>>>>>> file:/getFilePath( "0" , "testServer" )
>>>>>>>>
>>>>>>>> Does that mean I cannot use UDF to define the local directory name?
>>>>>>>> Or am I doing anything wrong in here?
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ~Regards
>>>>>>>> *Manisha Eleperuma*
>>>>>>>> Software Engineer
>>>>>>>> WSO2, Inc.: http://wso2.com
>>>>>>>> lean.enterprise.middleware
>>>>>>>>
>>>>>>>> *
>>>>>>>> *
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ~Regards
>>>>>>> *Manisha Eleperuma*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.: http://wso2.com
>>>>>>> lean.enterprise.middleware
>>>>>>>
>>>>>>> *
>>>>>>> *
>>>>>>> *
>>>>>>> *
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Dev mailing list
>>>>>>> [email protected]
>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> Tharindu
>>>>>>
>>>>>> blog: http://mackiemathew.com/
>>>>>> M: +94777759908
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list
>>>>>> [email protected]
>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Afkham Azeez*
>>>>> Director of Architecture; WSO2, Inc.; http://wso2.com
>>>>> Member; Apache Software Foundation; http://www.apache.org/
>>>>> * <http://www.apache.org/>**
>>>>> email: **[email protected]* <[email protected]>* cell: +94 77 3320919
>>>>> blog: **http://blog.afkham.org* <http://blog.afkham.org>*
>>>>> twitter: 
>>>>> **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
>>>>> *
>>>>> linked-in: **http://lk.linkedin.com/in/afkhamazeez*
>>>>> *
>>>>> *
>>>>> *Lean . Enterprise . Middleware*
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Tharindu
>>>>
>>>> blog: http://mackiemathew.com/
>>>> M: +94777759908
>>>>
>>>>
>>>
>>>
>>> --
>>> *Afkham Azeez*
>>> Director of Architecture; WSO2, Inc.; http://wso2.com
>>> Member; Apache Software Foundation; http://www.apache.org/
>>> * <http://www.apache.org/>**
>>> email: **[email protected]* <[email protected]>* cell: +94 77 3320919
>>> blog: **http://blog.afkham.org* <http://blog.afkham.org>*
>>> twitter: **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
>>> *
>>> linked-in: **http://lk.linkedin.com/in/afkhamazeez*
>>> *
>>> *
>>> *Lean . Enterprise . Middleware*
>>>
>>>
>>
>>
>> --
>> Regards,
>>
>> Tharindu
>>
>> blog: http://mackiemathew.com/
>> M: +94777759908
>>
>>
>
>
> --
> *Afkham Azeez*
> Director of Architecture; WSO2, Inc.; http://wso2.com
> Member; Apache Software Foundation; http://www.apache.org/
> * <http://www.apache.org/>**
> email: **[email protected]* <[email protected]>* cell: +94 77 3320919
> blog: **http://blog.afkham.org* <http://blog.afkham.org>*
> twitter: **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
> *
> linked-in: **http://lk.linkedin.com/in/afkhamazeez*
> *
> *
> *Lean . Enterprise . Middleware*
>
>


-- 
Regards,

Tharindu

blog: http://mackiemathew.com/
M: +94777759908

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

Reply via email to