Hi All

I'm a newbie to hadoop and hive and am trying to set it up on a cluster. I
am trying to find out more about the partitioning as done in Hive. If I use
a create table statement with a "partitioned by" clause, which as per the
documentation is a virtual column, is the data physically partitioned on
multiple nodes (meaning would the different nodes have different subsets of
the actual data)? Is it possible to check the content of each partition?

Actually, I'm trying to compare the concepts of Hive with some other
frameworks such as Greenplum where the data is distributed across nodes.

Any help/pointers is appreciated. Thanx in advance.

Cheers
Arijit

-- 
"And when the night is cloudy,
There is still a light that shines on me,
Shine on until tomorrow, let it be."

Reply via email to