insert select * from foo On Mon, Jul 23, 2012 at 7:15 PM, Afkham Azeez <[email protected]> wrote:
> > > On Mon, Jul 23, 2012 at 6:41 PM, Tharindu Mathew <[email protected]>wrote: > >> If you are planning to do a few MB, that would mean that the size of logs >> will be ( size of logs * no. of tenants ), so roughly for 200 active >> tenants and 2 MB of logs, it would come to around 400 MB. This is still >> manageable in a custom task if your data processing is low. >> >> On Mon, Jul 23, 2012 at 6:24 PM, Afkham Azeez <[email protected]> wrote: >> >>> Like you said, the task may not be the best way to do this. Like we >>> discussed the other day, we can publish logs to unique column families >>> which contain the <Service>_<Tenant>_<Date> as the unique identifier. We >>> need to generate logs in a file format & allow tenant users to download >>> those. What is the best approach to generate these log files from the data >>> collected? Typically, such a log file can run into a few MB. >> >> I'm a bit confused as we did not need to use Hive as per our earlier >> conversation. This is because as the data is published it is already >> grouped by server/ tenant and date. >> > > Yeah, there is no analytics to be done. It is a problem of converting data > stored in Cassandra into a flat file. > > >> >>> Azeez >>> >>> >>> On Mon, Jul 23, 2012 at 6:18 PM, Tharindu Mathew <[email protected]>wrote: >>> >>>> I'm no expert, but I immediately question the scale of this approach. >>>> >>>> Do you have an idea of how much of logs you plan to process per task? >>>> >>>> >>>> On Mon, Jul 23, 2012 at 6:13 PM, Afkham Azeez <[email protected]> wrote: >>>> >>>>> The requirement is simple. We need to generate log files on a per >>>>> tenant, per date, per Service basis. Now as a big data & analytics expert, >>>>> please advise us on what is the best solution for this. >>>>> >>>>> Azeez >>>>> >>>>> >>>>> On Mon, Jul 23, 2012 at 6:05 PM, Tharindu Mathew <[email protected]>wrote: >>>>> >>>>>> So through this custom java task, what is the scale of log processing >>>>>> you will support? 100MB, 1 GB, 100 GB, 1 TB? >>>>>> >>>>>> On Mon, Jul 23, 2012 at 5:14 PM, Manisha Gayathri >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> Contacted Hive User Group as well on this matter. >>>>>>> They also mentioned that this approach is not possible. >>>>>>> Also as per the chat I had with Buddhika, right now, these kind of >>>>>>> dynamic variable creations is not possible in Hive that comes with BAM2. >>>>>>> >>>>>>> Therefore IMO, without going ahead with this cumbersome process, the >>>>>>> best way will be to run a scheduled java task to pick data from relevant >>>>>>> Cassandra Column families and dynamically generate the relevant log >>>>>>> files >>>>>>> (according to the tenantID and current date) which will be stored in >>>>>>> Apache >>>>>>> Directory. >>>>>>> >>>>>> You are going to store the results in a LDAP? >>>>>> >>>>>>> >>>>>>> As per the offline chat had with Azeez, will start to work on a >>>>>>> custom Java task that can handle the above scenario. >>>>>>> >>>>>>> On Mon, Jul 23, 2012 at 2:27 PM, Manisha Gayathri >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> For a log file storing scenario using BAM2, I have a requirement to >>>>>>>> generate separate log files for each date. For that I have created a >>>>>>>> Hive >>>>>>>> Analytic query along with a Hive UDF as well. >>>>>>>> >>>>>>>> I have the getFilePath function which should return a URL like >>>>>>>> this. >>>>>>>> >>>>>>>> home/user/Desktop/logDir/logs/log_0_testServer_2012_07_22 >>>>>>>> >>>>>>>> The defined function works perfectly if I put *getFilePath( >>>>>>>> "0","testServer" ) *into the *select* statement. >>>>>>>> >>>>>>>> But I want to get that particular URL as the *local directory name*. >>>>>>>> (The requirement is such that this should not be hard-coded in the hive >>>>>>>> query. Rather should be generated in the custom UDF. ) >>>>>>>> >>>>>>>> So can I do something like I v shown below? >>>>>>>> >>>>>>>> *set file_name= getFilePath( "0","testServer" ); *//Define a >>>>>>>> parameter.* * >>>>>>>> *.................* >>>>>>>> *..............* >>>>>>>> *INSERT OVERWRITE LOCAL DIRECTORY 'file:///${hiveconf:file_name}' >>>>>>>> *//Assign the above parameter as the file URL >>>>>>>> >>>>>>>> I tried this way. But the directory name is returned as >>>>>>>> >>>>>>>> file:/getFilePath( "0" , "testServer" ) >>>>>>>> >>>>>>>> Does that mean I cannot use UDF to define the local directory name? >>>>>>>> Or am I doing anything wrong in here? >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Regards >>>>>>>> *Manisha Eleperuma* >>>>>>>> Software Engineer >>>>>>>> WSO2, Inc.: http://wso2.com >>>>>>>> lean.enterprise.middleware >>>>>>>> >>>>>>>> * >>>>>>>> * >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Regards >>>>>>> *Manisha Eleperuma* >>>>>>> Software Engineer >>>>>>> WSO2, Inc.: http://wso2.com >>>>>>> lean.enterprise.middleware >>>>>>> >>>>>>> * >>>>>>> * >>>>>>> * >>>>>>> * >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Dev mailing list >>>>>>> [email protected] >>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> >>>>>> Tharindu >>>>>> >>>>>> blog: http://mackiemathew.com/ >>>>>> M: +94777759908 >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Dev mailing list >>>>>> [email protected] >>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Afkham Azeez* >>>>> Director of Architecture; WSO2, Inc.; http://wso2.com >>>>> Member; Apache Software Foundation; http://www.apache.org/ >>>>> * <http://www.apache.org/>** >>>>> email: **[email protected]* <[email protected]>* cell: +94 77 3320919 >>>>> blog: **http://blog.afkham.org* <http://blog.afkham.org>* >>>>> twitter: >>>>> **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez> >>>>> * >>>>> linked-in: **http://lk.linkedin.com/in/afkhamazeez* >>>>> * >>>>> * >>>>> *Lean . Enterprise . Middleware* >>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> >>>> Tharindu >>>> >>>> blog: http://mackiemathew.com/ >>>> M: +94777759908 >>>> >>>> >>> >>> >>> -- >>> *Afkham Azeez* >>> Director of Architecture; WSO2, Inc.; http://wso2.com >>> Member; Apache Software Foundation; http://www.apache.org/ >>> * <http://www.apache.org/>** >>> email: **[email protected]* <[email protected]>* cell: +94 77 3320919 >>> blog: **http://blog.afkham.org* <http://blog.afkham.org>* >>> twitter: **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez> >>> * >>> linked-in: **http://lk.linkedin.com/in/afkhamazeez* >>> * >>> * >>> *Lean . Enterprise . Middleware* >>> >>> >> >> >> -- >> Regards, >> >> Tharindu >> >> blog: http://mackiemathew.com/ >> M: +94777759908 >> >> > > > -- > *Afkham Azeez* > Director of Architecture; WSO2, Inc.; http://wso2.com > Member; Apache Software Foundation; http://www.apache.org/ > * <http://www.apache.org/>** > email: **[email protected]* <[email protected]>* cell: +94 77 3320919 > blog: **http://blog.afkham.org* <http://blog.afkham.org>* > twitter: **http://twitter.com/afkham_azeez*<http://twitter.com/afkham_azeez> > * > linked-in: **http://lk.linkedin.com/in/afkhamazeez* > * > * > *Lean . Enterprise . Middleware* > > -- Regards, Tharindu blog: http://mackiemathew.com/ M: +94777759908
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
