Hi Sandeep, thanks for your interest in Iceberg.

> Do iceberg supports external hive table ? if yes  does it supports all
file systems like hdfs, s3 , wasb/adfs?

I'm not quite sure what you mean because Iceberg replaces Hive tables and
is not compatible with them. Sounds like you might be wondering about how
files are accessed and their life-cycle.

Iceberg uses Hadoop FileSystem to read and write files, so as long as you
have a configured FileSystem, you can use any of those paths. Iceberg
doesn't have a concept of "external" data. Iceberg expects to manage all of
the files underneath it and will delete files as you remove snapshots that
track deleted files (logical deletes happen first, physical deletes later).
You can avoid the deletes by passing in a callback to manage removal
yourself. When you drop a table, there's a flag for whether the data files
should be removed. We wanted to make sure there is flexibility here for
platform teams to be able to clean up data as they need to. For example,
our users never delete data; we use Janitor services to clean up old
partitions, snapshots, and dangling files.

> Can we migrate externally created hive tables to iceberg tables without
deleting our existing data on s3?

There is an import utility you can use to create Iceberg metadata around
files in an existing Hive table:
https://github.com/apache/incubator-iceberg/blob/master/spark/src/main/scala/org/apache/iceberg/spark/SparkTableUtil.scala#L479-L483

> Does iceberg have CLI support for DDL and DML queries?

Iceberg doesn't directly provide SQL support. For that, we have integration
with Spark and Presto.

Presto supports
<https://github.com/prestosql/presto/tree/master/presto-iceberg#status>
"create,
CTAS, drop, rename, and reading from Iceberg tables. It also supports
adding, dropping, and renaming columns."

Spark 2.4 supports <http://iceberg.apache.org/spark/> only the DataFrame
API, and SQL support is coming in 3.0.

> if Iceberg supports external tables does it support ORC file format also?

Iceberg doesn't support ORC yet, but there is a pull request for it that is
getting really close to merging. I'm currently trying to make time to
review it.

On Wed, Nov 20, 2019 at 1:56 AM Sandeep Kumar <[email protected]> wrote:

> All,
>
> I am new to iceberg and want to explore iceberg to optimise hive query
> response time.
> I have couple of questions regarding same.
> 1.) Do iceberg supports external hive table ? if yes  does it supports all
> file systems like hdfs, s3 , wasb/adfs?
> 2) Can we migrate externally created hive tables to iceberg tables without
> deleting our existing data on s3?
> 3) Does iceberg have cli support for DDL and DML queries?
> 4) if Iceberg supports external tables does it support ORC file format
> also?
>
>
> Are there and code snippet examples for migrating hive tables to iceberg?
>
> Your help is much appreciated.
>
> Best Regards,
> Sandeep
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to