Selectively include EXTERNAL TABLE source files via REGEX
---------------------------------------------------------

                 Key: HIVE-951
                 URL: https://issues.apache.org/jira/browse/HIVE-951
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Carl Steinbach


CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular 
expression. 
CREATE EXTERNAL TABLE was designed to allow users to access data that exists 
outside of Hive, and
currently makes the assumption that all of the files located under the supplied 
path should be included
in the new table. Users frequently encounter directories containing multiple
datasets, or directories that contain data in heterogeneous schemas, and it's 
often
impractical or impossible to adjust the layout of the directory to meet the 
requirements of 
CREATE EXTERNAL TABLE. A good example of this problem is creating an external 
table based
on the contents of an S3 bucket. 

One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
as follows:

CREATE EXTERNAL TABLE
...
LOCATION path [file_regex]
...

For example:

{code:sql}
CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
STORED AS TEXTFILE
LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
{code}

Creates mytable1 which includes all files in s3:/my.bucket with a filename 
matching 'folder/2009*.bz2'

{code:sql}
CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
STORED AS TEXTFILE 
LOCATION 'hdfs://data/' 'xyz.*2009????.bz2$';
{code}

Creates mytable2 including all files matching 'xyz*2009????.bz2' located under 
hdfs://data/



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to