Hi All I'm a newbie to hadoop and hive and am trying to set it up on a cluster. I am trying to find out more about the partitioning as done in Hive. If I use a create table statement with a "partitioned by" clause, which as per the documentation is a virtual column, is the data physically partitioned on multiple nodes (meaning would the different nodes have different subsets of the actual data)? Is it possible to check the content of each partition?
Actually, I'm trying to compare the concepts of Hive with some other frameworks such as Greenplum where the data is distributed across nodes. Any help/pointers is appreciated. Thanx in advance. Cheers Arijit -- "And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be."
