RE: Questions for the future work of Hive

Ashish Thusoo Mon, 10 Aug 2009 12:35:12 -0700

Hive trunk has support for multi group by which performs better than what 0.3.0 
does.


I did not completely understand your comment on "the two mappings should take 
place at the same time"..

Can you elaborate?

Ashish 

-----Original Message-----
From: Andraz Tori [mailto:[email protected]] 
Sent: Monday, August 10, 2009 1:11 AM
To: [email protected]
Subject: Re: Questions for the future work of Hive

> 2) We don't have a short-term plan for automatic-multi-partition 
> insertion. However there is a simple workaround if you know the 
> partition values (and Hive can do multiple inserts in a single 
> map-reduce job!). "src" can be a sub query as well.
> FROM src
> INSERT OVERWRITE TABLE tgt PARTITION(pcol="2009-08-01") SELECT * WHERE 
> ts = "2009-08-01"
> INSERT OVERWRITE TABLE tgt PARTITION(pcol="2009-08-02") SELECT * WHERE 
> ts = "2009-08-02"

--------------------------------------------------------

In my case src too is partitioned by "ts", which means that two mappings should 
take place at the same time since the data is independant, but Hive (0.3) 
produces a linear partition-by-partition job sequence.
I also do group by inside every insert...


Any ideas?

[this, together with the fact that hive --service thriftserver (at least in 
0.3) doesn't support multiple clients, makes it very hard to effectively run 
some queries. 




--
Andraz Tori, CTO
Zemanta Ltd, New York, London, Ljubljana www.zemanta.com
mail: [email protected]
tel: +386 41 515 767
twitter: andraz, skype: minmax_test

RE: Questions for the future work of Hive

Reply via email to