[
https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782213#action_12782213
]
Avram Aelony commented on HIVE-951:
-----------------------------------
I want to echo Carl's point about copying to a new location being **extremely**
inconvenient.
Sometimes this inconvenience is impossible and a clear show stopper.
In my mind, Namit's point about interchangeability can be resolved by clear
documentation, whereas it is not always possible to copy data around from
bucket to bucket especially if the data is quite large. In my case, being
forced to stage large volumes of existing S3 data to other S3 data buckets
(copying the same data) for the purpose of Hive analysis has really slowed Hive
adoption.
Engineers output data to S3 with their own map/reduce efficiencies in mind
without regard for Hive's preferred data organization so analysts are looking
forward to having this feature so we can use Hive again.
Can't wait for this feature!
> Selectively include EXTERNAL TABLE source files via REGEX
> ---------------------------------------------------------
>
> Key: HIVE-951
> URL: https://issues.apache.org/jira/browse/HIVE-951
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Carl Steinbach
>
> CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular
> expression.
> CREATE EXTERNAL TABLE was designed to allow users to access data that exists
> outside of Hive, and
> currently makes the assumption that all of the files located under the
> supplied path should be included
> in the new table. Users frequently encounter directories containing multiple
> datasets, or directories that contain data in heterogeneous schemas, and it's
> often
> impractical or impossible to adjust the layout of the directory to meet the
> requirements of
> CREATE EXTERNAL TABLE. A good example of this problem is creating an external
> table based
> on the contents of an S3 bucket.
> One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
> as follows:
> CREATE EXTERNAL TABLE
> ...
> LOCATION path [file_regex]
> ...
> For example:
> {code:sql}
> CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
> STORED AS TEXTFILE
> LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
> {code}
> Creates mytable1 which includes all files in s3:/my.bucket with a filename
> matching 'folder/2009*.bz2'
> {code:sql}
> CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
> STORED AS TEXTFILE
> LOCATION 'hdfs://data/' 'xyz.*2009????.bz2$';
> {code}
> Creates mytable2 including all files matching 'xyz*2009????.bz2' located
> under hdfs://data/
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.