Hi all I've just started looking into Hive, and working my way through some of the tutorials out there (notably the one from Cloudera, which I found really helpful).
I think I'm going to want to use Hive with Amazon EMR for my 'real' work, so I've been trying to replicate some of the steps in the tutorial using Amazon EMR. I ran into a few gotchas, and didn't find a lot of information on their site, so I've blogged a very basic article on how to get started from scratch (more or less). Hopefully, it will help other people in a similar position. http://roninonrails.blogspot.com/2009/11/introduction-to-hive-on-amazon-elastic.html I'm sure I've made lots of factual errors, so any constructive feedback would be most welcome. Also, if anyone knows of other newbie tutorials for Hive, whether on Amazon EMR or not, please let me know. David
