Re: Questions for the future work of Hive

Andraz Tori Mon, 10 Aug 2009 01:12:00 -0700

> 2) We don't have a short-term plan for automatic-multi-partition
> insertion. However there is a simple workaround if you know the
> partition values (and Hive can do multiple inserts in a single
> map-reduce job!). "src" can be a sub query as well.
> FROM src
> INSERT OVERWRITE TABLE tgt PARTITION(pcol="2009-08-01") SELECT * WHERE
> ts = "2009-08-01"
> INSERT OVERWRITE TABLE tgt PARTITION(pcol="2009-08-02") SELECT * WHERE
> ts = "2009-08-02"


--------------------------------------------------------

In my case src too is partitioned by "ts", which means that two mappings should 
take place at the same time since the data is independant, but Hive (0.3) 
produces a linear partition-by-partition job sequence.
I also do group by inside every insert...


Any ideas?

[this, together with the fact that hive --service thriftserver (at least in 
0.3) doesn't support multiple clients, makes it very hard to effectively run 
some queries. 




-- 
Andraz Tori, CTO
Zemanta Ltd, New York, London, Ljubljana
www.zemanta.com
mail: [email protected]
tel: +386 41 515 767
twitter: andraz, skype: minmax_test

Re: Questions for the future work of Hive

Reply via email to