I'm no expert, but I immediately question the scale of this approach.

Do you have an idea of how much of logs you plan to process per task?

On Mon, Jul 23, 2012 at 6:13 PM, Afkham Azeez <[email protected]> wrote:

> The requirement is simple. We need to generate log files on a per tenant,
> per date, per Service basis. Now as a big data & analytics expert, please
> advise us on what is the best solution for this.
>
> Azeez
>
>
> On Mon, Jul 23, 2012 at 6:05 PM, Tharindu Mathew <[email protected]>wrote:
>
>> So through this custom java task, what is the scale of log processing you
>> will support? 100MB, 1 GB, 100 GB, 1 TB?
>>
>> On Mon, Jul 23, 2012 at 5:14 PM, Manisha Gayathri <[email protected]>wrote:
>>
>>> Contacted Hive User Group as well on this matter.
>>> They also mentioned that this approach is not possible.
>>> Also as per the chat I had with Buddhika, right now, these kind of
>>> dynamic variable creations is not possible in Hive that comes with BAM2.
>>>
>>> Therefore IMO, without going ahead with this cumbersome process, the
>>> best way will be to run a scheduled java task to pick data from relevant
>>> Cassandra Column families and dynamically generate the relevant log files
>>> (according to the tenantID and current date) which will be stored in Apache
>>> Directory.
>>>
>> You are going to store the results in a LDAP?
>>
>>>
>>> As per the offline chat had with Azeez, will start to work on a custom
>>> Java task that can handle the above scenario.
>>>
>>> On Mon, Jul 23, 2012 at 2:27 PM, Manisha Gayathri <[email protected]>wrote:
>>>
>>>> Hi,
>>>>
>>>> For a log file storing scenario using BAM2, I have a requirement to
>>>> generate separate log files for each date. For that I have created a Hive
>>>> Analytic query along with a Hive UDF as well.
>>>>
>>>> I have the getFilePath function which should return a URL like this.
>>>>
>>>> home/user/Desktop/logDir/logs/log_0_testServer_2012_07_22
>>>>
>>>> The defined function works perfectly if I put *getFilePath(
>>>> "0","testServer" ) *into the *select* statement.
>>>>
>>>> But I want to get that particular URL as the *local directory name*.
>>>> (The requirement is such that this should not be hard-coded in the hive
>>>> query. Rather should be generated in the custom UDF. )
>>>>
>>>> So can I do something like I v shown below?
>>>>
>>>> *set file_name= getFilePath( "0","testServer" );    *//Define a
>>>> parameter.* *
>>>> *.................*
>>>> *..............*
>>>> *INSERT OVERWRITE LOCAL DIRECTORY 'file:///${hiveconf:file_name}'
>>>>                *//Assign the above parameter as the file URL
>>>>
>>>> I tried this way. But the directory name is returned as
>>>>
>>>> file:/getFilePath( "0" , "testServer" )
>>>>
>>>> Does that mean I cannot use UDF to define the local directory name?
>>>> Or am I doing anything wrong in here?
>>>>
>>>>
>>>> --
>>>> ~Regards
>>>> *Manisha Eleperuma*
>>>> Software Engineer
>>>> WSO2, Inc.: http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> *
>>>> *
>>>>
>>>>
>>>
>>>
>>> --
>>> ~Regards
>>> *Manisha Eleperuma*
>>> Software Engineer
>>> WSO2, Inc.: http://wso2.com
>>> lean.enterprise.middleware
>>>
>>> *
>>> *
>>> *
>>> *
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list
>>> [email protected]
>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>
>>>
>>
>>
>> --
>> Regards,
>>
>> Tharindu
>>
>> blog: http://mackiemathew.com/
>> M: +94777759908
>>
>>
>> _______________________________________________
>> Dev mailing list
>> [email protected]
>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>
>>
>
>
> --
> *Afkham Azeez*
> Director of Architecture; WSO2, Inc.; http://wso2.com
> Member; Apache Software Foundation; http://www.apache.org/
> * <http://www.apache.org/>**
> email: **[email protected]* <[email protected]>* cell: +94 77 3320919
> blog: **http://blog.afkham.org* <http://blog.afkham.org>*
> twitter: **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
> *
> linked-in: **http://lk.linkedin.com/in/afkhamazeez*
> *
> *
> *Lean . Enterprise . Middleware*
>
>


-- 
Regards,

Tharindu

blog: http://mackiemathew.com/
M: +94777759908
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to