Hi Sandeep, thanks for your interest in Iceberg. > Do iceberg supports external hive table ? if yes does it supports all file systems like hdfs, s3 , wasb/adfs?
I'm not quite sure what you mean because Iceberg replaces Hive tables and is not compatible with them. Sounds like you might be wondering about how files are accessed and their life-cycle. Iceberg uses Hadoop FileSystem to read and write files, so as long as you have a configured FileSystem, you can use any of those paths. Iceberg doesn't have a concept of "external" data. Iceberg expects to manage all of the files underneath it and will delete files as you remove snapshots that track deleted files (logical deletes happen first, physical deletes later). You can avoid the deletes by passing in a callback to manage removal yourself. When you drop a table, there's a flag for whether the data files should be removed. We wanted to make sure there is flexibility here for platform teams to be able to clean up data as they need to. For example, our users never delete data; we use Janitor services to clean up old partitions, snapshots, and dangling files. > Can we migrate externally created hive tables to iceberg tables without deleting our existing data on s3? There is an import utility you can use to create Iceberg metadata around files in an existing Hive table: https://github.com/apache/incubator-iceberg/blob/master/spark/src/main/scala/org/apache/iceberg/spark/SparkTableUtil.scala#L479-L483 > Does iceberg have CLI support for DDL and DML queries? Iceberg doesn't directly provide SQL support. For that, we have integration with Spark and Presto. Presto supports <https://github.com/prestosql/presto/tree/master/presto-iceberg#status> "create, CTAS, drop, rename, and reading from Iceberg tables. It also supports adding, dropping, and renaming columns." Spark 2.4 supports <http://iceberg.apache.org/spark/> only the DataFrame API, and SQL support is coming in 3.0. > if Iceberg supports external tables does it support ORC file format also? Iceberg doesn't support ORC yet, but there is a pull request for it that is getting really close to merging. I'm currently trying to make time to review it. On Wed, Nov 20, 2019 at 1:56 AM Sandeep Kumar <[email protected]> wrote: > All, > > I am new to iceberg and want to explore iceberg to optimise hive query > response time. > I have couple of questions regarding same. > 1.) Do iceberg supports external hive table ? if yes does it supports all > file systems like hdfs, s3 , wasb/adfs? > 2) Can we migrate externally created hive tables to iceberg tables without > deleting our existing data on s3? > 3) Does iceberg have cli support for DDL and DML queries? > 4) if Iceberg supports external tables does it support ORC file format > also? > > > Are there and code snippet examples for migrating hive tables to iceberg? > > Your help is much appreciated. > > Best Regards, > Sandeep > -- Ryan Blue Software Engineer Netflix
