Like you said, the task may not be the best way to do this. Like we
discussed the other day, we can publish logs to unique column families
which contain the <Service>_<Tenant>_<Date> as the unique identifier. We
need to generate logs in a file format & allow tenant users to download
those. What is the best approach to generate these log files from the data
collected? Typically, such a log file can run into a few MB.

Azeez

On Mon, Jul 23, 2012 at 6:18 PM, Tharindu Mathew <[email protected]> wrote:

> I'm no expert, but I immediately question the scale of this approach.
>
> Do you have an idea of how much of logs you plan to process per task?
>
>
> On Mon, Jul 23, 2012 at 6:13 PM, Afkham Azeez <[email protected]> wrote:
>
>> The requirement is simple. We need to generate log files on a per tenant,
>> per date, per Service basis. Now as a big data & analytics expert, please
>> advise us on what is the best solution for this.
>>
>> Azeez
>>
>>
>> On Mon, Jul 23, 2012 at 6:05 PM, Tharindu Mathew <[email protected]>wrote:
>>
>>> So through this custom java task, what is the scale of log processing
>>> you will support? 100MB, 1 GB, 100 GB, 1 TB?
>>>
>>> On Mon, Jul 23, 2012 at 5:14 PM, Manisha Gayathri <[email protected]>wrote:
>>>
>>>> Contacted Hive User Group as well on this matter.
>>>> They also mentioned that this approach is not possible.
>>>> Also as per the chat I had with Buddhika, right now, these kind of
>>>> dynamic variable creations is not possible in Hive that comes with BAM2.
>>>>
>>>> Therefore IMO, without going ahead with this cumbersome process, the
>>>> best way will be to run a scheduled java task to pick data from relevant
>>>> Cassandra Column families and dynamically generate the relevant log files
>>>> (according to the tenantID and current date) which will be stored in Apache
>>>> Directory.
>>>>
>>> You are going to store the results in a LDAP?
>>>
>>>>
>>>> As per the offline chat had with Azeez, will start to work on a custom
>>>> Java task that can handle the above scenario.
>>>>
>>>> On Mon, Jul 23, 2012 at 2:27 PM, Manisha Gayathri <[email protected]>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> For a log file storing scenario using BAM2, I have a requirement to
>>>>> generate separate log files for each date. For that I have created a Hive
>>>>> Analytic query along with a Hive UDF as well.
>>>>>
>>>>> I have the getFilePath function which should return a URL like this.
>>>>>
>>>>> home/user/Desktop/logDir/logs/log_0_testServer_2012_07_22
>>>>>
>>>>> The defined function works perfectly if I put *getFilePath(
>>>>> "0","testServer" ) *into the *select* statement.
>>>>>
>>>>> But I want to get that particular URL as the *local directory name*.
>>>>> (The requirement is such that this should not be hard-coded in the hive
>>>>> query. Rather should be generated in the custom UDF. )
>>>>>
>>>>> So can I do something like I v shown below?
>>>>>
>>>>> *set file_name= getFilePath( "0","testServer" );    *//Define a
>>>>> parameter.* *
>>>>> *.................*
>>>>> *..............*
>>>>> *INSERT OVERWRITE LOCAL DIRECTORY 'file:///${hiveconf:file_name}'
>>>>>                *//Assign the above parameter as the file URL
>>>>>
>>>>> I tried this way. But the directory name is returned as
>>>>>
>>>>> file:/getFilePath( "0" , "testServer" )
>>>>>
>>>>> Does that mean I cannot use UDF to define the local directory name?
>>>>> Or am I doing anything wrong in here?
>>>>>
>>>>>
>>>>> --
>>>>> ~Regards
>>>>> *Manisha Eleperuma*
>>>>> Software Engineer
>>>>> WSO2, Inc.: http://wso2.com
>>>>> lean.enterprise.middleware
>>>>>
>>>>> *
>>>>> *
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ~Regards
>>>> *Manisha Eleperuma*
>>>> Software Engineer
>>>> WSO2, Inc.: http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> *
>>>> *
>>>> *
>>>> *
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list
>>>> [email protected]
>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Tharindu
>>>
>>> blog: http://mackiemathew.com/
>>> M: +94777759908
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list
>>> [email protected]
>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>
>>>
>>
>>
>> --
>> *Afkham Azeez*
>> Director of Architecture; WSO2, Inc.; http://wso2.com
>> Member; Apache Software Foundation; http://www.apache.org/
>> * <http://www.apache.org/>**
>> email: **[email protected]* <[email protected]>* cell: +94 77 3320919
>> blog: **http://blog.afkham.org* <http://blog.afkham.org>*
>> twitter: **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
>> *
>> linked-in: **http://lk.linkedin.com/in/afkhamazeez*
>> *
>> *
>> *Lean . Enterprise . Middleware*
>>
>>
>
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
> M: +94777759908
>
>


-- 
*Afkham Azeez*
Director of Architecture; WSO2, Inc.; http://wso2.com
Member; Apache Software Foundation; http://www.apache.org/
* <http://www.apache.org/>**
email: **[email protected]* <[email protected]>* cell: +94 77 3320919
blog: **http://blog.afkham.org* <http://blog.afkham.org>*
twitter: **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
*
linked-in: **http://lk.linkedin.com/in/afkhamazeez*
*
*
*Lean . Enterprise . Middleware*
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to