Hive trunk has support for multi group by which performs better than what 0.3.0 does.
I did not completely understand your comment on "the two mappings should take place at the same time".. Can you elaborate? Ashish -----Original Message----- From: Andraz Tori [mailto:[email protected]] Sent: Monday, August 10, 2009 1:11 AM To: [email protected] Subject: Re: Questions for the future work of Hive > 2) We don't have a short-term plan for automatic-multi-partition > insertion. However there is a simple workaround if you know the > partition values (and Hive can do multiple inserts in a single > map-reduce job!). "src" can be a sub query as well. > FROM src > INSERT OVERWRITE TABLE tgt PARTITION(pcol="2009-08-01") SELECT * WHERE > ts = "2009-08-01" > INSERT OVERWRITE TABLE tgt PARTITION(pcol="2009-08-02") SELECT * WHERE > ts = "2009-08-02" -------------------------------------------------------- In my case src too is partitioned by "ts", which means that two mappings should take place at the same time since the data is independant, but Hive (0.3) produces a linear partition-by-partition job sequence. I also do group by inside every insert... Any ideas? [this, together with the fact that hive --service thriftserver (at least in 0.3) doesn't support multiple clients, makes it very hard to effectively run some queries. -- Andraz Tori, CTO Zemanta Ltd, New York, London, Ljubljana www.zemanta.com mail: [email protected] tel: +386 41 515 767 twitter: andraz, skype: minmax_test
