Can i modify the stream of data and then insert into hive

2010-08-25 Thread Guru Prasad
Hi, Suppose i get a stream of data. Now I want to filter this 'stream data' and then insert into hive table. For example lets say a file datas.txt has following info guru12delhi prasad13gurgaon

Re: hwo to hive add hive_exec.jar to hadoop

2010-08-25 Thread Edward Capriolo
On Wed, Aug 25, 2010 at 1:07 AM, lei liu liulei...@gmail.com wrote: When hadoop one job which is submmited by hive,  hadoop need the hive_exec.jar,  hwo to  hive add hive_exec.jar to hadoop? Please tell me the where are codes in hive. Thanks, LiuLei I think what you are looking for is a

Re: Directing output from Hive MR second custom MR job

2010-08-25 Thread Maxim Veksler
Hi Neil, On Wed, Aug 25, 2010 at 2:41 PM, Neil Xu neil.x...@gmail.com wrote: You can set the input path and output path for each job, and run jobs in order. ex. TwoJobs.java public class TwoJobs extends Configured implements Tool { public static class Job1Mapper extends MapReduceBase

RE: How is Union All optimized in Hive

2010-08-25 Thread Namit Jain
Yes, it is optimized by hive. There will be only 1 mr job, even if the columns selected were different. -namit From: Neil Xu [neil.x...@gmail.com] Sent: Wednesday, August 25, 2010 2:40 AM To: hive-user@hadoop.apache.org Subject: How is Union All

Confused about after add columns for hive table

2010-08-25 Thread SingoWong
Hi, I had execute ALTER TABLE tablename ADD COLUMNS (newcolumns STRING) to add column, before i add this column, got 4 columns, the new column will after the 4th column, and before partition column, and now the new data in hdfs was exist in the 5th columns, but i execute SELECT * FROM tablename,

Re: Directing output from Hive MR second custom MR job

2010-08-25 Thread Neil Xu
Hi, Maxim, I misunderstand what you want, you need a job chain that a MR job(not hive) can be automatically run after a Hive job is done, and temp files can also be cleaned automatically? I have no idea also, but in our company, a scheduling system is implemented to manage different kinds