Hi Edward,
 
Thanks for the reply. I think what you said is reasonable. I will do more
test of complex operate.
 
Regards
Zhou

  _____  

From: Edward Capriolo [mailto:[email protected]] 
Sent: Tuesday, June 22, 2010 11:32 PM
To: [email protected]
Cc: [email protected]
Subject: Re: hive Multi Table/File Inserts questions




On Tue, Jun 22, 2010 at 2:55 AM, Zhou Shuaifeng <[email protected]>
wrote:


Hi, when I use Multi Table/File Inserts commands, some may be not more
effective than run table insert commands separately.
 
For example, 
 
    from pokes 
    insert overwrite table pokes_count
    select bar,count(foo) group by bar
    insert overwrite table pokes_sum
    select bar,sum(foo) group by bar;
 
To execute this, 2 map/reduce jobs is needed, which is not less than run the
two command separately:
 
    insert overwrite table pokes_count select bar,count(foo) from pokesÂ
group by bar;
    insert overwrite table pokes_sum select bar,sum(foo) from pokesÂ
group by bar;Â Â 
 
And the time taken is the same. 
But the first one seems only scan the table 'pokes' once, why still need 2
map/reduce jobs? And why the time taken couldnot be less?
Is there any way to make it more effective?
 
Thanks a lot,
Zhou

This e-mail and its attachments contain confidential information from
HUAWEI, which 
is intended only for the person or entity whose address is listed above. Any
use of the 
information contained herein in any way (including, but not limited to,
total or partial 
disclosure, reproduction, or dissemination) by persons other than the
intended 
recipient(s) is prohibited. If you receive this e-mail in error, please
notify the sender by 
phone or email immediately and delete it!

 


Zhou,

In the case of simple selects and a few tables you are not going to see the
full benefit.

Imagine some complex query was like this:

from (
  from (
    select (table1 join table2 where x=6) t1 
  ) x
  join table3 on x.col1 = t3.col1
) y

This could theoretically be a chain of thousands of map reduce jobs. Then
you would save jobs and time by only evaluating once. 

Also you are only testing with 2 output tables. What happens with 10 or 20?
Just curious.

Regards,
Edward

Reply via email to