whenever you create a partition in hive, it needs to be registered with the metadata store. So short answer would be partition data is looked from metadata store instead of the actual source data. having a lot of partitions does slow down hive (around 10000+). Normally have not seen anyone using hourly partitions. You may want to look at adding daily partition and bucket by hour.
but if you are adding data directly into partition directories then there is no alternative other than adding partitions to metadata store manually apart from alter partition. if you are using hcatalog as metadata store then it does provide an api to register your partition so you can automate your data loading and registering both in a single flow. Others will correct me if I have made any wrong assumption On Mon, Apr 15, 2013 at 8:15 PM, Steve Hoffman <ste...@goofy.net> wrote: > Looking for some pointers on where the partitioning is figured out in the > source when a query is executed. > I'm investigating an alternative partitioning scheme based on date patterns > (using external tables). > > The situation is that I have data being written to some HDFS root directory > with some dated pattern (i.e. YYYY/MM/DD). Today I have to run an alter > table to insert this partition every day. It gets worse if you have hourly > partitions. This seems like it can be described once (root + date > partition pattern in the metastore). > > So looking for some pointers on where in the code this is currently > handled. > > Thanks, > Steve > -- Nitin Pawar