On Mon, Jul 23, 2012 at 7:21 PM, Tharindu Mathew <[email protected]> wrote:

> insert select * from foo
>
> File names should be dynamically generated according to the tenant id as
well as the service name (ie T1_P1_T1.gz) and also when selecting data we
need to retrieve column families dynamically depending on the tenant id and
the product name.
So we need to create custom functions and assign them to variables to get
file paths as well as column family names as far as I know, right now there
is    no way to do it using hive.

>
> On Mon, Jul 23, 2012 at 7:15 PM, Afkham Azeez <[email protected]> wrote:
>
>>
>>
>> On Mon, Jul 23, 2012 at 6:41 PM, Tharindu Mathew <[email protected]>wrote:
>>
>>> If you are planning to do a few MB, that would mean that the size of
>>> logs will be ( size of logs * no. of tenants ), so roughly for 200 active
>>> tenants and 2 MB of logs, it would come to around 400 MB. This is still
>>> manageable in a custom task if your data processing is low.
>>>
>>> On Mon, Jul 23, 2012 at 6:24 PM, Afkham Azeez <[email protected]> wrote:
>>>
>>>> Like you said, the task may not be the best way to do this. Like we
>>>> discussed the other day, we can publish logs to unique column families
>>>> which contain the <Service>_<Tenant>_<Date> as the unique identifier. We
>>>> need to generate logs in a file format & allow tenant users to download
>>>> those. What is the best approach to generate these log files from the data
>>>> collected? Typically, such a log file can run into a few MB.
>>>
>>> I'm a bit confused as we did not need to use Hive as per our earlier
>>> conversation. This is because as the data is published it is already
>>> grouped by server/ tenant and date.
>>>
>>
>> Yeah, there is no analytics to be done. It is a problem of converting
>> data stored in Cassandra into a flat file.
>>
>>
>>>
>>>> Azeez
>>>>
>>>>
>>>> On Mon, Jul 23, 2012 at 6:18 PM, Tharindu Mathew <[email protected]>wrote:
>>>>
>>>>> I'm no expert, but I immediately question the scale of this approach.
>>>>>
>>>>> Do you have an idea of how much of logs you plan to process per task?
>>>>>
>>>>>
>>>>> On Mon, Jul 23, 2012 at 6:13 PM, Afkham Azeez <[email protected]> wrote:
>>>>>
>>>>>> The requirement is simple. We need to generate log files on a per
>>>>>> tenant, per date, per Service basis. Now as a big data & analytics 
>>>>>> expert,
>>>>>> please advise us on what is the best solution for this.
>>>>>>
>>>>>> Azeez
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 23, 2012 at 6:05 PM, Tharindu Mathew 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> So through this custom java task, what is the scale of log
>>>>>>> processing you will support? 100MB, 1 GB, 100 GB, 1 TB?
>>>>>>>
>>>>>>> On Mon, Jul 23, 2012 at 5:14 PM, Manisha Gayathri 
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>>> Contacted Hive User Group as well on this matter.
>>>>>>>> They also mentioned that this approach is not possible.
>>>>>>>> Also as per the chat I had with Buddhika, right now, these kind of
>>>>>>>> dynamic variable creations is not possible in Hive that comes with 
>>>>>>>> BAM2.
>>>>>>>>
>>>>>>>> Therefore IMO, without going ahead with this cumbersome process,
>>>>>>>> the best way will be to run a scheduled java task to pick data from
>>>>>>>> relevant Cassandra Column families and dynamically generate the 
>>>>>>>> relevant
>>>>>>>> log files (according to the tenantID and current date) which will be 
>>>>>>>> stored
>>>>>>>> in Apache Directory.
>>>>>>>>
>>>>>>> You are going to store the results in a LDAP?
>>>>>>>
>>>>>>>>
>>>>>>>> As per the offline chat had with Azeez, will start to work on a
>>>>>>>> custom Java task that can handle the above scenario.
>>>>>>>>
>>>>>>>> On Mon, Jul 23, 2012 at 2:27 PM, Manisha Gayathri <[email protected]
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> For a log file storing scenario using BAM2, I have a requirement
>>>>>>>>> to generate separate log files for each date. For that I have created 
>>>>>>>>> a
>>>>>>>>> Hive Analytic query along with a Hive UDF as well.
>>>>>>>>>
>>>>>>>>> I have the getFilePath function which should return a URL like
>>>>>>>>> this.
>>>>>>>>>
>>>>>>>>> home/user/Desktop/logDir/logs/log_0_testServer_2012_07_22
>>>>>>>>>
>>>>>>>>> The defined function works perfectly if I put *getFilePath(
>>>>>>>>> "0","testServer" ) *into the *select* statement.
>>>>>>>>>
>>>>>>>>> But I want to get that particular URL as the *local directory name
>>>>>>>>> *. (The requirement is such that this should not be hard-coded in
>>>>>>>>> the hive query. Rather should be generated in the custom UDF. )
>>>>>>>>>
>>>>>>>>> So can I do something like I v shown below?
>>>>>>>>>
>>>>>>>>> *set file_name= getFilePath( "0","testServer" );    *//Define a
>>>>>>>>> parameter.* *
>>>>>>>>> *.................*
>>>>>>>>> *..............*
>>>>>>>>> *INSERT OVERWRITE LOCAL DIRECTORY 'file:///${hiveconf:file_name}'
>>>>>>>>>                    *//Assign the above parameter as the file URL
>>>>>>>>>
>>>>>>>>> I tried this way. But the directory name is returned as
>>>>>>>>>
>>>>>>>>> file:/getFilePath( "0" , "testServer" )
>>>>>>>>>
>>>>>>>>> Does that mean I cannot use UDF to define the local directory
>>>>>>>>> name?
>>>>>>>>> Or am I doing anything wrong in here?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ~Regards
>>>>>>>>> *Manisha Eleperuma*
>>>>>>>>> Software Engineer
>>>>>>>>> WSO2, Inc.: http://wso2.com
>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>
>>>>>>>>> *
>>>>>>>>> *
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ~Regards
>>>>>>>> *Manisha Eleperuma*
>>>>>>>> Software Engineer
>>>>>>>> WSO2, Inc.: http://wso2.com
>>>>>>>> lean.enterprise.middleware
>>>>>>>>
>>>>>>>> *
>>>>>>>> *
>>>>>>>> *
>>>>>>>> *
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>>
>>>>>>> Tharindu
>>>>>>>
>>>>>>> blog: http://mackiemathew.com/
>>>>>>> M: +94777759908
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Dev mailing list
>>>>>>> [email protected]
>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Afkham Azeez*
>>>>>> Director of Architecture; WSO2, Inc.; http://wso2.com
>>>>>> Member; Apache Software Foundation; http://www.apache.org/
>>>>>> * <http://www.apache.org/>**
>>>>>> email: **[email protected]* <[email protected]>* cell: +94 77 3320919
>>>>>> blog: **http://blog.afkham.org* <http://blog.afkham.org>*
>>>>>> twitter: 
>>>>>> **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
>>>>>> *
>>>>>> linked-in: **http://lk.linkedin.com/in/afkhamazeez*
>>>>>> *
>>>>>> *
>>>>>> *Lean . Enterprise . Middleware*
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>>
>>>>> Tharindu
>>>>>
>>>>> blog: http://mackiemathew.com/
>>>>> M: +94777759908
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Afkham Azeez*
>>>> Director of Architecture; WSO2, Inc.; http://wso2.com
>>>> Member; Apache Software Foundation; http://www.apache.org/
>>>> * <http://www.apache.org/>**
>>>> email: **[email protected]* <[email protected]>* cell: +94 77 3320919
>>>> blog: **http://blog.afkham.org* <http://blog.afkham.org>*
>>>> twitter: 
>>>> **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
>>>> *
>>>> linked-in: **http://lk.linkedin.com/in/afkhamazeez*
>>>> *
>>>> *
>>>> *Lean . Enterprise . Middleware*
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Tharindu
>>>
>>> blog: http://mackiemathew.com/
>>> M: +94777759908
>>>
>>>
>>
>>
>> --
>> *Afkham Azeez*
>> Director of Architecture; WSO2, Inc.; http://wso2.com
>> Member; Apache Software Foundation; http://www.apache.org/
>> * <http://www.apache.org/>**
>> email: **[email protected]* <[email protected]>* cell: +94 77 3320919
>> blog: **http://blog.afkham.org* <http://blog.afkham.org>*
>> twitter: **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez>
>> *
>> linked-in: **http://lk.linkedin.com/in/afkhamazeez*
>> *
>> *
>> *Lean . Enterprise . Middleware*
>>
>>
>
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
> M: +94777759908
>
>
> _______________________________________________
> Dev mailing list
> [email protected]
> http://wso2.org/cgi-bin/mailman/listinfo/dev
>
>
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to