Well, the code to infer partitions from HDFS directory exists in old version of 
Hive. You need to bring that back (and possibly make some modifications to 
reflect latest code). But the work involved here is to disallow tables being 
marked as EXTERNAL and also disallow setting Partition properties. There may be 
couple of other things that need to be taken care of that I can't think of 
right now.

It doesn't look like much.

Prasad

________________________________
From: Chris Goffinet <[email protected]>
Reply-To: <[email protected]>
Date: Mon, 17 Aug 2009 18:38:40 -0700
To: <[email protected]>
Subject: Re: Dynamic Partitioning?

How much work is involved for such a feature?

-Chris

On Aug 17, 2009, at 6:19 PM, Prasad Chakka wrote:

We could make this feature per table property which doesn't have the extended 
feature set supported...


________________________________
From: Frederick Oko <[email protected] 
<x-msg://89/[email protected]> >
Reply-To: <[email protected] <x-msg://89/[email protected]> 
>
Date: Thu, 13 Aug 2009 02:12:54 -0700
To: <[email protected] <x-msg://89/[email protected]> >
Subject: Re: Dynamic Partitioning?

Actually this is what Hive originally did -- it used to trust partitions it 
discovered via HDFS -- this blind trust could be leveraged for just what you 
are requesting as partions do follow a simple directory scheme (and there is 
precedent for such out-of-band data loading). However, this blind trust became 
incompatible with extended feature set of external tables and per-partition 
schemas introduced earlier this year. The re-enabling of this behavior based on 
configuration is currently tracked as 
https://issues.apache.org/jira/browse/HIVE-493 'automatically infer existing 
partitions of table from HDFS files'.

On Tue, Aug 11, 2009 at 11:15 AM, Chris Goffinet <[email protected] 
<x-msg://89/[email protected]> > wrote:
Hi

I was wondering if anyone has thought about the possibility of having dynamic 
partitioning in Hive? Right now you typically use LOAD DATA or ALTER TABLE to 
add new partitions. It would be great for applications like Scribe that can 
load data into HDFS, could just place the data into the correct folder 
structure for your partitions on HDFS. Has anyone investigated this? What is 
everyone else doing in regards to things like this? It seems a little error 
prone to have a cron job run everyday adding new partitions. It might not even 
be possible to do dynamic partitioning since its meta data read. But I'd love 
to hear thoughts?

-Chris




Reply via email to