Shant Hovsepian created IMPALA-10327:
----------------------------------------

             Summary: SymlinkTextInputFormat for reading manifest file based 
tables. 
                 Key: IMPALA-10327
                 URL: https://issues.apache.org/jira/browse/IMPALA-10327
             Project: IMPALA
          Issue Type: New Feature
          Components: Catalog, Frontend
            Reporter: Shant Hovsepian


The 
[SymlinkTextInputFormat|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java]
 was an early Hadoop/Hive feature that has recently started to see lots of use. 
Originally it was used to support symlinks in hive warehouse directories but 
now it's more commonly used as a way to support specifying the files that make 
up a hive table without requiring a directory listing operation.

Instead of pointing to a directory of files or partitions the Hive table 
metadata refers to a single directory containing "manifest files", these files 
have a well defined format which specifies the files that constitute the table.

This mechanism is used by in the following cases.
 * Delta Lakes uses manifest to generate consistent read-only views of its 
table format for use by Presto, Hive, and Redshift Spectrum 
[https://docs.delta.io/0.7.0/integrations.html]
 * AWS Redshift can UNLOAD Redshift tables and partitions to corresponding 
parquets files on S3 for consumption by other tools: 
[https://docs.aws.amazon.com/redshift/latest/dg/loading-data-files-using-manifest.html]
 * AWS S3 Inventories: 
[https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html]

 

Using the functionality with HDFS and S3 even without the need to interop with 
the above would provider performance benefits by avoiding expensive directory 
listing operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to