On Tue, Jun 22, 2010 at 2:55 AM, Zhou Shuaifeng <[email protected]>wrote:

>  Hi, when I use Multi *Table*/File Inserts commands, some may be not more
> effective than run table insert commands separately.
>
> For example,
>
>     from pokes
>     insert overwrite table pokes_count
>     select bar,count(foo) group by bar
>     insert overwrite table pokes_sum
>     select bar,sum(foo) group by bar;
>
> To execute this, 2 map/reduce jobs is needed, which is not less than run
> the two command separately:
>
>     insert overwrite table pokes_count select bar,count(foo) from
> pokes group by bar;
>     insert overwrite table pokes_sum select bar,sum(foo) from pokes group
> by bar;
>
> And the time taken is the same.
> But the first one seems only scan the table 'pokes' once, why still need 2
> map/reduce jobs? And why the time taken couldnot be less?
> Is there any way to make it more effective?
>
> Thanks a lot,
> Zhou
>
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above.
> Any use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>
>

Zhou,

In the case of simple selects and a few tables you are not going to see the
full benefit.

Imagine some complex query was like this:

from (
  from (
    select (table1 join table2 where x=6) t1
  ) x
  join table3 on x.col1 = t3.col1
) y

This could theoretically be a chain of thousands of map reduce jobs. Then
you would save jobs and time by only evaluating once.

Also you are only testing with 2 output tables. What happens with 10 or 20?
Just curious.

Regards,
Edward

Reply via email to