Hi Edward, Thanks for the reply. I think what you said is reasonable. I will do more test of complex operate. Regards Zhou
_____ From: Edward Capriolo [mailto:[email protected]] Sent: Tuesday, June 22, 2010 11:32 PM To: [email protected] Cc: [email protected] Subject: Re: hive Multi Table/File Inserts questions On Tue, Jun 22, 2010 at 2:55 AM, Zhou Shuaifeng <[email protected]> wrote: Hi, when I use Multi Table/File Inserts commands, some may be not more effective than run table insert commands separately.  For example,     from pokes    insert overwrite table pokes_count    select bar,count(foo) group by bar    insert overwrite table pokes_sum    select bar,sum(foo) group by bar;  To execute this, 2 map/reduce jobs is needed, which is not less than run the two command separately:     insert overwrite table pokes_count select bar,count(foo) from pokes group by bar;     insert overwrite table pokes_sum select bar,sum(foo) from pokes group by bar;   And the time taken is the same. But the first one seems only scan the table 'pokes' once, why still need 2 map/reduce jobs? And why the time taken couldnot be less? Is there any way to make it more effective?  Thanks a lot, Zhou This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!  Zhou, In the case of simple selects and a few tables you are not going to see the full benefit. Imagine some complex query was like this: from (  from (    select (table1 join table2 where x=6) t1  ) x  join table3 on x.col1 = t3.col1 ) y This could theoretically be a chain of thousands of map reduce jobs. Then you would save jobs and time by only evaluating once. Also you are only testing with 2 output tables. What happens with 10 or 20? Just curious. Regards, Edward
