BTW, JIRA HIVE-1047 subsumes HIVE-1039 in trunk. So if you are using branch 0.5.0, HIVE-1039 is already there. If you are using 0.4 or previous releases, you can either apply HIVE-1039 or HIVE-1047. Both of them are very simple changes.
On Jan 21, 2010, at 9:56 AM, Namit Jain wrote: > Which version are you using ? > > The bug mentioned was fixed by: > > https://issues.apache.org/jira/browse/HIVE-1039 > > > Thanks, > -namit > > -----Original Message----- > From: Min Zhou [mailto:[email protected]] > Sent: Thursday, January 21, 2010 12:40 AM > To: [email protected] > Subject: Re: hive multiple inserts > > It should be a bug of hive. see below > > hive> set hive.merge.mapfiles=true; > hive> explain from netflix insert overwrite table t1 select movie_id > insert overwrite table t2 select user_id; > OK > ABSTRACT SYNTAX TREE: > (TOK_QUERY (TOK_FROM (TOK_TABREF netflix)) (TOK_INSERT > (TOK_DESTINATION (TOK_TAB t1)) (TOK_SELECT (TOK_SELEXPR > (TOK_TABLE_OR_COL movie_id)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB > t2)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL user_id))))) > > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-5 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-5 > Stage-8 depends on stages: Stage-2 > Stage-1 depends on stages: Stage-8 > > STAGE PLANS: > Stage: Stage-2 > Map Reduce > Alias -> Map Operator Tree: > netflix > TableScan > alias: netflix > Select Operator > expressions: > expr: movie_id > type: string > outputColumnNames: _col0 > File Output Operator > compressed: true > GlobalTableId: 1 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t1 > Select Operator > expressions: > expr: user_id > type: string > outputColumnNames: _col0 > File Output Operator > compressed: true > GlobalTableId: 2 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t2 > > Stage: Stage-5 > Conditional Operator > list of dependent Tasks: > Move Operator > files: > hdfs directory: true > destination: > hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/794320195/10000 > Map Reduce > Alias -> Map Operator Tree: > > hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/570535800/10004 > Reduce Output Operator > sort order: > Map-reduce partition columns: > expr: rand() > type: double > tag: -1 > value expressions: > expr: foo > type: string > Reduce Operator Tree: > Extract > File Output Operator > compressed: true > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t1 > > Stage: Stage-0 > Move Operator > tables: > replace: true > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t1 > > Stage: Stage-8 > Conditional Operator > list of dependent Tasks: > Move Operator > files: > hdfs directory: true > destination: > hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/794320195/10002 > Map Reduce > Alias -> Map Operator Tree: > > hdfs://hdpnn.cm3:9000/group/tbdev/zhoumin/hive-tmp/570535800/10005 > Reduce Output Operator > sort order: > Map-reduce partition columns: > expr: rand() > type: double > tag: -1 > value expressions: > expr: bar > type: string > Reduce Operator Tree: > Extract > File Output Operator > compressed: true > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t2 > > Stage: Stage-1 > Move Operator > tables: > replace: true > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t2 > > ----------------------------------------------------------------------------- > hive> set hive.merge.mapfiles=false; > hive> explain from netflix insert overwrite table t1 select movie_id > insert overwrite table t2 select user_id; > OK > ABSTRACT SYNTAX TREE: > (TOK_QUERY (TOK_FROM (TOK_TABREF netflix)) (TOK_INSERT > (TOK_DESTINATION (TOK_TAB t1)) (TOK_SELECT (TOK_SELEXPR > (TOK_TABLE_OR_COL movie_id)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB > t2)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL user_id))))) > > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-0 depends on stages: Stage-2 > > STAGE PLANS: > Stage: Stage-2 > Map Reduce > Alias -> Map Operator Tree: > netflix > TableScan > alias: netflix > Select Operator > expressions: > expr: movie_id > type: string > outputColumnNames: _col0 > File Output Operator > compressed: true > GlobalTableId: 1 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t1 > Select Operator > expressions: > expr: user_id > type: string > outputColumnNames: _col0 > File Output Operator > compressed: true > GlobalTableId: 2 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t2 > > Stage: Stage-0 > Move Operator > tables: > replace: true > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: t1 > > From explanations above, we can see the query under the second > condition would only generate one move operator. This is the reason > why t2 will be empty. > > > > Regards, > Min > > On Wed, Jan 13, 2010 at 5:01 PM, Zheng Shao <[email protected]> wrote: >> https://issues.apache.org/jira/browse/HIVE-634 >> As far as I know there is nobody working on that right now. If you are >> interested, we can work together on that. >> Let's move the discussion to the JIRA. >> >> Zheng >> >> On Tue, Jan 12, 2010 at 3:27 AM, Anty <[email protected]> wrote: >>> >>> Thanks Zheng. >>> We have used RegexSerDe in some use cases, but the speed is indeed slower, >>> so we don't want to use regular expression if not necessary. >>> >>> yes, we have used RegexSerDe in some use cases. >>> I found HIVE-634 is what i need ,allowing for the user to specify field >>> delimiter with any format. >>> >>> INSERT OVERWRITE LOCAL DIRECTORY '/mnt/daily_timelines' >>> [ ROW FORMAT DELIMITED | SERDE ... ] >>> [ FILE FORMAT ...] >>> SELECT * FROM daily_timelines; >>> >>> Is somebody still working on this feature? >>> >>> On Tue, Jan 12, 2010 at 2:28 PM, Zheng Shao <[email protected]> wrote: >>>> Yes we only support one-byte delimiter for performance reasons. >>>> >>>> You can use the RegexSerDe in the contrib package for any row format >>>> that >>>> allows a regular expression (including your case "<>"), but the speed >>>> will >>>> be slower. >>>> >>>> Zheng >>>> >>>> On Mon, Jan 11, 2010 at 5:54 PM, Anty <[email protected]> wrote: >>>>> >>>>> Thanks Zheng. >>>>> It does works. >>>>> I have a another question,if the field delimiter is a string ,e.g. >>>>> "<>",it looks like the LazySimpleSerDe can't works.Does the >>>>> LazySimpleSerDe didn't support string field delimiter,only one byte of >>>>> control characters? >>>>> >>>>> On Tue, Jan 12, 2010 at 3:05 AM, Zheng Shao <[email protected]> wrote: >>>>>> For your second question, currently we can do it with a little extra >>>>>> work: >>>>>> 1. Create an external table on the target directory with the field >>>>>> delimiter you want; >>>>>> 2. Run the query and insert overwrite the target external table. >>>>>> >>>>>> For the first question we can also do the similar thing (create a >>>>>> bunch of external table and then insert), but I think we should fix >>>>>> the problem. >>>>>> >>>>>> Zheng >>>>>> >>>>>> On Mon, Jan 11, 2010 at 8:31 AM, Anty <[email protected]> wrote: >>>>>>> HI: >>>>>>> I came across the same problean, therein is no data.I have one >>>>>>> more question,can i specify the field delimiter for the output >>>>>>> file,not just the default ctrl-a field delimiter? >>>>>>> >>>>>>> On Fri, Jan 8, 2010 at 2:23 PM, wd <[email protected]> wrote: >>>>>>>> hi, >>>>>>>> >>>>>>>> I'v tried use hive svn version, seems this bug still exists. >>>>>>>> >>>>>>>> svn st -v >>>>>>>> >>>>>>>> 896805 896744 namit . >>>>>>>> 896805 894292 namit eclipse-templates >>>>>>>> 896805 894292 namit >>>>>>>> eclipse-templates/.classpath >>>>>>>> 896805 765509 zshao >>>>>>>> eclipse-templates/TestHive.launchtemplate >>>>>>>> 896805 765509 zshao >>>>>>>> eclipse-templates/TestMTQueries.l >>>>>>>> .......... >>>>>>>> >>>>>>>> svn reversion 896805 ? >>>>>>>> >>>>>>>> follows is the execute log. >>>>>>>> >>>>>>>> hive> from >>>>>>>> test >>>>>>>>> INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' >>>>>>>> select * >>>>>>>> where >>>>>>>> a = 1 >>>>>>>>> INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' >>>>>>>> select * >>>>>>>> where >>>>>>>> a = 3; >>>>>>>> Total MapReduce jobs = 1 >>>>>>>> Launching Job 1 out of 1 >>>>>>>> Number of reduce tasks is set to 0 since there's no reduce operator >>>>>>>> Starting Job = job_201001071716_4691, Tracking URL = >>>>>>>> http://abc.com:50030/jobdetails.jsp?jobid=job_201001071716_4691 >>>>>>>> Kill Command = hadoop job -Dmapred.job.tracker=abc.com:9001 -kill >>>>>>>> job_201001071716_4691 >>>>>>>> 2010-01-08 14:14:55,442 Stage-2 map = 0%, reduce = 0% >>>>>>>> 2010-01-08 14:15:00,643 Stage-2 map = 100%, reduce = 0% >>>>>>>> Ended Job = job_201001071716_4691 >>>>>>>> Copying data to local directory /home/stefdong/tmp/0 >>>>>>>> Copying data to local directory /home/stefdong/tmp/0 >>>>>>>> 13 Rows loaded to /home/stefdong/tmp/0 >>>>>>>> 9 Rows loaded to /home/stefdong/tmp/1 >>>>>>>> OK >>>>>>>> Time taken: 9.409 seconds >>>>>>>> >>>>>>>> >>>>>>>> thx. >>>>>>>> >>>>>>>> 2010/1/6 wd <[email protected]> >>>>>>>>> >>>>>>>>> hi, >>>>>>>>> >>>>>>>>> Single insert can extract data into '/tmp/out/1'.I even can see >>>>>>>>> "xxx >>>>>>>>> rows >>>>>>>>> loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in >>>>>>>>> multi >>>>>>>>> inserts, but there is no data in fact. >>>>>>>>> >>>>>>>>> Havn't try svn revision, will try it today.thx. >>>>>>>>> >>>>>>>>> 2010/1/5 Zheng Shao <[email protected]> >>>>>>>>>> >>>>>>>>>> Looks like a bug. >>>>>>>>>> What is the svn revision of Hive? >>>>>>>>>> >>>>>>>>>> Did you verify that single insert into '/tmp/out/1' produces >>>>>>>>>> non-empty >>>>>>>>>> files? >>>>>>>>>> >>>>>>>>>> Zheng >>>>>>>>>> >>>>>>>>>> On Tue, Jan 5, 2010 at 12:51 AM, wd <[email protected]> wrote: >>>>>>>>>>> In hive wiki: >>>>>>>>>>> >>>>>>>>>>> Hive extension (multiple inserts): >>>>>>>>>>> FROM from_statement >>>>>>>>>>> INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 >>>>>>>>>>> >>>>>>>>>>> [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 >>>>>>>>>>> select_statement2] >>>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>> I'm try to use hive multi inserts to extract data from hive to >>>>>>>>>>> local >>>>>>>>>>> disk. >>>>>>>>>>> Follows is the hql >>>>>>>>>>> >>>>>>>>>>> from test_tbl >>>>>>>>>>> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/0' select select * >>>>>>>>>>> where >>>>>>>>>>> id%10=0 >>>>>>>>>>> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/1' select select * >>>>>>>>>>> where >>>>>>>>>>> id%10=1 >>>>>>>>>>> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out/2' select select * >>>>>>>>>>> where >>>>>>>>>>> id%10=2 >>>>>>>>>>> >>>>>>>>>>> This hql can execute, but only /tmp/out/0 have datafile in it, >>>>>>>>>>> other >>>>>>>>>>> directories are empty. why this happen? bug? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Yours, >>>>>>>>>> Zheng >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards >>>>>>> Anty Rao >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Yours, >>>>>> Zheng >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> Anty Rao >>>> >>>> >>>> >>>> -- >>>> Yours, >>>> Zheng >>>> >>> >>> >>> >>> -- >>> Best Regards >>> Anty Rao >>> >> >> >> >> -- >> Yours, >> Zheng >> > > > > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > My profile: > http://www.linkedin.com/in/coderplay > My blog: > http://coderplay.javaeye.com
